Issuu on Google+

Epigenetics 2010: A Collection from the PLoS Journals

www.ploscollections.org/epigenetics2010

Image Credit: http://commons.wikimedia.org/wiki/File:Nucleosome_1KX5_2.png

Produced with support from New England Biolabs. The PLoS journal editors have sole responsibility for the content of this collection.


UNDERSTANDING CHANGE

New tools to advance epigenetics research For over 35 years, New England Biolabs has been committed to understanding the mechanisms of restriction and methylation of DNA. This expertise in enzymology has recently led to the development of a suite of validated products for epigenetics research. These unique solutions to study DNA methylation are designed to address some of the challenges of the current methods. EpiMark™ validated reagents simplify epigenetics research and expand the potential for biomarker discovery. Simplify DNA methylation analysis with MspJI

Hela

+

Plant (Maize) – +

Yeast

EpiMark™ validated products include: • Newly discovered methylation-dependent restriction enzymes

+

MspJI

• A novel kit for 5-hmC and 5-mC analysis and quantitation • Methyltransferases • Histones • Genomic DNAs

32 bp

To learn how these products can help you to better understand epigenetic changes, visit neb.com/epigenetics.

MspJI recognizes methylated and hydroxymethylated DNA and cleaves out 32 bp fragments for sequencing analysis. Overnight digestion of 1 µg of genomic DNA from various sources with or without MspJI is shown. Note: Yeast DNA does not contain methylated DNA, therefore no 32-mer is detected.

CLONING & MAPPING

DNA AMPLIFICATION & PCR

RNA ANALYSIS

PROTEIN EXPRESSION & ANALYSIS

GENE EXPRESSION & CELLULAR ANALYSIS

www.neb.com


Primer 1

Centromeres Convert but Don’t Cross Paul B. Talbert, Steven Henikoff* Howard Hughes Medical Institute, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America

Thus, selective pressure to reduce crossing over near the centromere is likely to be strong. Crossing over within the centromere itself could be even more deleterious, leading to attachment of the centromere to both halves of the spindle, resulting in chromosome breakage and loss.

A long-standing problem in chromosome biology concerns the dynamic nature of centromeres. These chromosomal sites assemble the protein machines called kinetochores that connect chromosomes to the spindle microtubules for segregation to daughter cells during mitosis and meiosis. In multicelluar eukaryotes, centromeres are typically composed of highly homogeneous tandem repeats that evolve rapidly despite their highly conserved function [1]. For tandem repeats to evolve, a mutation must spread by some recombinational process, but a persistent dogma is that centromeres do not undergo homologous chromosome recombination (the shuffling of genetic segments between chromosomal pairs). New evidence [2] challenges this dogma and addresses the problem of rapidly evolving centromeres.

Centromeres, Heterochromatin, and Crossover Suppression How is crossing over suppressed at centromeres? The location of centromeres in heterochromatin raises the possibility that the crossover suppression seen at the centromere may simply be a property of the surrounding heterochromatin. Early attempts to separate heterochromatin from the centromere utilized inversions of pericentric heterochromatin on the Drosophila X chromosome and suggested that the centromere can suppress recombination independently of its flanking heterochromatin [18]. Subsequent work confirmed that heterochromatin also suppresses crossing over [19], consistent with its proposed role in facilitating cohesion. An increase in crossovers in Drosophila mutants that affect heterochromatin structure support the role of heterochromatin in suppressing pericentric crossovers [20]. Crossover suppression in plants also appears to be a feature of both centromeres and flanking heterochromatin. In Arabidopsis thaliana, crossing over is reduced .200-fold in the 2.3-Mb centromere region of Chromosome I, and 10–50 fold by the 1-Mb heterochromatic flanking regions [12], At the molecular level, centromeres are distinguished from both heterochromatin and euchromatin by specialized nucleosomes containing the centromere-specific histone H3 variant known as CENP-A or CenH3, which is necessary to form the kinetochore. Occasionally functional CenH3-containing centromeres can arise on DNA that was previously non-centromeric and be faithfully transmitted (neocentromeres), indicating that centromere inheritance is epigenetic, dependent on the presence of CenH3 nucleosomes, not on specific DNA sequences (reviewed in [1]). Despite the apparent irrelevance of centromeric DNA sequence to kinetochore function, natural centromeres in plants and animals are usually composed of Mb-sized tandem arrays of short

The Role of Crossing Over in Meiosis Centromeres do not act alone in orchestrating chromosome segregation. In order for sister kinetochores to properly disjoin (separate) and segregate chromosomes equally to daughter cells in mitosis, their sister chromatids must be linked so that the pulling forces from the two halves of the spindle generate tension to correctly orient the kinetochores, stabilize kinetochore attachments, and signal that kinetochores are ready to disjoin. Centromeres in multicellular eukaryotes are typically embedded in heterochromatin, the permanently condensed chromatin found around centromeres, in contrast to the euchromatic chromosome arms, which decondense between mitoses. Heterochromatin has been implicated in facilitating cohesion of sister chromatids around the centromere. This cohesion is mediated by cohesins, proteins that link the sisters together and that are enriched around centromeres [3], and possibly also by catenation (interlocking) of DNA threads observed between sister centromeres [4]. In most eukaryotes, homologs become physically linked during meiosis through the recombinational process of ‘‘crossing over’’—the breakage and reciprocal reunion of homologous chromatids, resulting in a chiasma, the point where recombinant chromatids cross over each other (Figure 1). Failure to cross over is a major source of non-disjunction (improper segregation) at the first meiotic division in animals [5,6], underscoring the importance of chiasmata for segregation of homologs. As early as 1930, observations on the distribution of chiasmata along chromosomes led Karl Sax to predict that crossing over (and hence genetic recombination) is reduced around the centromere [7], and this ‘‘centromere effect’’ was verified in the fruitfly Drosophila melanogaster soon afterward [8]. Suppression of crossing over around or in centromeres has since been verified in several animals [9,10], plants [11–14], and fungi [15,16], with estimates of crossover suppression ranging from 5-fold to .200-fold in different organisms. Why is crossing over suppressed around centromeres? In Drosophila [5], humans [6], and budding yeast (Saccharomyces cerevisiae) [17], non-disjunction events at the second meiotic division are enriched in centromere-proximal crossovers. This suggests that crossovers that are too close to the centromere disrupt pericentric sister chromatid cohesion, leading to premature separation of sister chromatids, which then segregate randomly. PLoS Biology | www.plosbiology.org

Citation: Talbert PB, Henikoff S (2010) Centromeres Convert but Don’t Cross. PLoS Biol 8(3): e1000326. doi:10.1371/journal.pbio.1000326 Published March 9, 2010 Copyright: ß 2010 Talbert, Henikoff. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: This work was supported by the Howard Hughes Medical Institute. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. Abbreviations: CenH3, centromeric histone H3 variant; CRM2, centromerespecific retrotransposon of maize 2; LD, linkage disequilibrium * E-mail: steveh@fhcrc.org

1

March 2010 | Volume 8 | Issue 3 | e1000326


Figure 1. Chromosome connections in meiosis. Kinetochores attach homologous chromosomes to opposite halves of the spindle. Homologs are held together by chiasmata, in which recombinant chromatids cross each other. Sisters are held together by cohesins and possibly by catenation of centromeric DNA threads, which have been observed in human mitosis. Cohesion is released in two steps: on chromosome arms to resolve chiasmata and separate homologs in the first meiotic division; and around centromeres to separate sisters in the second meiotic division. doi:10.1371/journal.pbio.1000326.g001

Figure 2. Unequal exchange in satellite arrays. Identical tandem satellite repeats become diversified over time by mutation. Unequal exchange results in gain or loss of tandem repeats. Repeated exchange can lead to homogenization of satellite repeats (left). If the unit of exchange consists of multiple diverged monomers, higher-order repeats are generated (right). doi:10.1371/journal.pbio.1000326.g002

PLoS Biology | www.plosbiology.org

2

March 2010 | Volume 8 | Issue 3 | e1000326


(150–180 bp) noncoding ‘‘satellite’’ repeats. These arrays may also be rich in transposon insertions, probably because suppression of crossing over prevents their elimination through recombination. The same or similar repeats comprise the flanking pericentric heterochromatin, underscoring the epigenetic specification of centromeres by CenH3 nucleosomes. Although both centromeres and pericentric heterochromatin are rich in repetitive elements, repeats per se do not appear to be necessary for crossover suppression. For example, centromere 8 of rice (Oryza sativa), which has only very little satellite DNA, lacks detectable crossovers in a 2.3-Mb span around the 750-kb centromere region that contains discontinuous blocks of CenH3containing nucleosomes. Remarkably, there is little difference in gene activity, transposon composition, or abundance of common histone modifications between this recombination-free region and adjacent recombining regions [21], suggesting that crossover suppression does not depend on DNA sequence but instead is epigenetic. A clearer separation of centromere and heterochromatin effects can be found in budding yeast, which is unusual in having ‘‘point’’ centromeres that are only ,120 bp in length [22], harbor a single CenH3 nucleosome [23] and lack surrounding heterochromatin. Suppression of crossing over at yeast centromeres is modest, estimated at only 3–6 fold, and extends over only about 10 kb or less [15,24], although this represents as much as 80 times the length of the centromere itself. This suppression is eliminated by a point mutation in the centromere that renders it unable to assemble a functional kinetochore [25], strongly suggesting that the kinetochore mediates suppression.

Satellite Arrays and Recombination Although crossing over is suppressed around centromeres, the tandem satellite array structure that is typical for most centromeres is best explained by extensive and repeated recombination. The generation of such arrays has been modeled as a recombinational process of random unequal exchange [26]. Unequal exchange can act on variation in the individual satellite monomers due to mutation to lead to expansion of new repeat variants and/or formation of higher-order repeats (Figure 2), as well as eliminating variation in monomers (homogenization). In the human X chromosome, the CenH3-containing chromatin is found centrally in the most recent and most homogeneous higher-order repeats of the human alpha satellite array, whereas the older and more diversified satellite monomers comprise the flanking pericentric heterochromatin [27]. Analysis of the CentO satellites in centromeres of rice revealed segmental duplications, insertions and deletions, inversions, and reshuffling of variant satellite monomers [28]. Unequal exchange occurs at a high frequency between sister centromeres in mitotically cycling mouse (Mus musculus) chromosomes and is negatively regulated by DNA methylation, without which loss of repeats occurs [29]. However, it is unknown whether these recombination events can be transmitted through meiosis to the next generation. These observations provide evidence of extensive recombination in centromeres over evolutionary time scales and underscore the instability of repeat arrays to recombination and the necessity of suppressing crossing over in order to maintain centromere structure. How can this evidence for recombination in centromeres be reconciled with crossover suppression?

Figure 3. Gene conversion. In a popular model for gene conversion [41], recombination begins with a double-strand break in one chromosome (red) and resectioning (chewing back) of the 59 ends of the break. A free 39 end invades the homolog (blue) forming a D-loop and heteroduplex DNA. Non-reciprocal DNA synthesis fills in missing DNA (dashed arrows), forming two Holliday junctions, which may be resolved as either crossovers or noncrossovers, depending on which strands are cut (green and orange arrows). Gene conversion between homologs takes place in meiosis (bottom right), generating both crossovers and noncrossovers. Centromeres might undergo noncrossover conversion in mitotically cycling cells during growth and development (bottom left and center) as part of double-strand break repair. Conversion between homologs would be necessary to repair breaks prior to replication, when there is no cohering sister centromere to use as a repair template. doi:10.1371/journal.pbio.1000326.g003

Conversion in Centromeres In the same year that Sax predicted the centromere effect on crossing over, a new model of recombination, called gene PLoS Biology | www.plosbiology.org

3

March 2010 | Volume 8 | Issue 3 | e1000326


conversion, was proposed to explain non-reciprocal recombination events in mosses and basidiomycetes [30]. Gene conversion is now thought to be a normal part of the homologous recombination pathway in which a programmed double-strand break in the DNA is repaired by copying a short (usually ,2 kb or less) stretch of the homologous chromosome. The resulting conversion event may then be resolved into a either a crossover or a noncrossover (Figure 3). Could noncrossover gene conversions contribute to recombination in centromeres in the absence of crossing over? The localized nature of gene conversion makes it significantly more difficult to detect than crossing over. A key problem is the need for numerous closely spaced unique markers in the highly repetitive sequences of the centromere and pericentromere. Consequently this question has been most thoroughly addressed in budding yeast, which lacks centromeric and pericentric repeats. Most studies have concluded that gene conversion is moderately suppressed (4- to 7-fold) at yeast centromeres, along with crossing over [24,25]. However, initiating double-strand breaks are not found within the point centromeres, but rather nearby [31,32]. One study reported that when nearby conversion events were examined, the conversion tract frequently included part or all of the centromere, and concluded that conversion rates at centromeres were not different than in non-centromeric regions [33]. Thus, the small size of yeast centromeres means that the relationship between the kinetochore and suppression of gene conversion has remained ambiguous. To determine whether gene conversion events can occur within large centromeres and provide the recombination events underlying both satellite homogenization and centromere diversity, a new report by Shi et al. [2] studied events within the centromeres of maize (Zea mays). They developed 238 centromeric markers based on insertion polymorphisms of the centromere-specific transposon CRM2 that map to all ten maize centromeres. To verify their centromeric location, centromeric chromatin was immunoprecipitated with an anti-CenH3 antibody. CenH3 is distributed discontinuously in maize centromeres [34] and only about 30% of CRM sequences can be immunoprecipitated with anti-CenH3 [34–36]. Markers were then assessed in two parental lines and in 94 recombinant inbred lines derived from their progeny. As expected, no crossovers were observed. However, in two cases a single marker from one parent was gained in a centromere with all markers of the other parent, indicating a conversion event. The formal possibility that these events represent double crossovers is unlikely given the failure to find single crossovers. Shi et al. then proceeded to assess their marker set in 53 highly diverse inbred lines representing the diversity of maize and found widespread evidence for marker recombination since the origin of maize, perhaps 9,000 years ago [37]. They could distinguish

between crossovers and noncrossover conversions by determining the linkage disequilibrium (LD), or tendency of markers in a population to occur together on the same chromosome. In crossing over, LD decreases with distance, whereas the short conversion tracts of noncrossovers show no relationship between LD and distance, because the conversion of one marker ordinarily has no effect on the coinheritance of its neighbors. No correlation was found between distance and LD in centromere 2, which has been fully sequenced [36], consistent with noncrossover conversion. Two population genetic methods gave similar estimates of the conversion rate of .161025 conversions per marker per generation, a rate not dissimilar to one estimate for the conversion rate on the chromosome arms [38]. These results are significant both for understanding the regulation of recombination in maize and for understanding the evolution of centromeres. Except in yeasts, studying recombination in centromeres has hitherto been largely a matter of inferring the occurrence of ancient events based on present-day sequences. The results of Shi et al. show that it is possible to study centromeric recombination in action in a multicellular eukaryote. They also confirm that such recombination can take place between homologs and not solely between sisters, with implications for the creation and spread of new centromere variants. Meiotic recombination involves complete end-to-end pairing of homologs, whereas a gene conversion event requires only a local homologous interaction, and it is possible that the observed conversion events occurred during mitotic development rather than during meiosis (Figure 3). For example, the mitotic threads seen to connect human sister centromeres [4] might sometimes be resolved via breakage events that initiate repair by homologous recombination. By this scenario, the surprisingly high level of genetic exchange observed by Shi et al. might be a consequence of the many mitoses that occur for each meiotic generation within a maize lineage. Widespread gene conversion might be a general feature of centromeres of multicellular eukaryotes. Human centromeres are composed of higher-order alpha satellite repeat arrays [27], and evidence for their periodic homogenization suggests an underlying gene conversion mechanism [39]. As is the case for unequal exchange between sisters, which is the most attractive explanation for the large expansions and contractions of alpha satellite repeat arrays, centromeric gene conversion challenges the widely held perception of centromeres as genetically stable regions of the genome. The actions of gene conversion and unequal exchange provide variation that makes possible Darwinian competition of centromeres that may lead to their rapid diversification [40]. Thus the problem of both homogenization and diversification of centromeres in the absence of crossovers can be resolved.

References 1. Malik HS, Henikoff S (2009) Major evolutionary transitions in centromere complexity. Cell 138: 1067–1082. 2. Shi J, Wolf SE, Burke JM, Presting GG, Ross-Ibara J, et al. (2010) Widespread gene conversion in centromere cores. PLoS Biol 8(3): e1000327. doi:10.1371/ journal.pbio.1000327. 3. Gartenberg M (2009) Heterochromatin and the cohesion of sister chromatids. Chromosome Res 17: 229–238. 4. Wang LH, Schwarzbraun T, Speicher MR, Nigg EA (2008) Persistence of DNA threads in human anaphase cells suggests late completion of sister chromatid decatenation. Chromosoma 117: 123–135. 5. Koehler KE, Boulton CL, Collins HE, French RL, Herman KC, et al. (1996) Spontaneous X chromosome MI and MII nondisjunction events in Drosophila melanogaster oocytes have different recombinational histories. Nat Genet 14: 406–414. 6. Lamb NE, Sherman SL, Hassold TJ (2005) Effect of meiotic recombination on the production of aneuploid gametes in humans. Cytogenet Genome Res 111: 250–255. 7. Sax K (1930) Chromosome structure and the mechanism of crossing over. J Arnold Arb 11: 193–220.

PLoS Biology | www.plosbiology.org

8. Beadle GW (1932) A possible influence of the spindle fibre on crossing-over in Drosophila. Proc Natl Acad Sci U S A 18: 160–165. 9. Mahtani MM, Willard HF (1998) Physical and genetic mapping of the human X chromosome centromere: Repression of recombination. Genome Res 8: 100–110. 10. Rahn MI, Solari AJ (1986) Recombination nodules in the oocytes of the chicken, Gallus domesticus. Cytogenet Cell Genet 43: 187–193. 11. Sherman JD, Stack SM (1995) Two-dimensional spreads of synaptonemal complexes from solanaceous plants. VI. high-resolution recombination nodule map for tomato (Lycopersicon esculentum). Genetics 141: 683–708. 12. Haupt W, Fischer TC, Winderl S, Fransz P, Torres-Ruiz RA (2001) The centromere1 (CEN1) region of Arabidopsis thaliana: Architecture and functional impact of chromatin. Plant J 27: 285–296. 13. Harushima Y, Yano M, Shomura A, Sato M, Shimano T, et al. (1998) A highdensity rice genetic linkage map with 2275 markers using a single F2 population. Genetics 148: 479–494. 14. Anderson LK, Doyle GG, Brigham B, Carter J, Hooker KD, et al. (2003) Highresolution crossover maps for each bivalent of Zea mays using recombination nodules. Genetics 165: 849–865.

4

March 2010 | Volume 8 | Issue 3 | e1000326


29. Jaco I, Canela A, Vera E, Blasco MA (2008) Centromere mitotic recombination in mammalian cells. J Cell Biol 181: 885–892. 30. Winkler H (1930) Die konversion der gene. Jena, Germany: Gustav Fischer. 31. Blitzblau HG, Bell GW, Rodriguez J, Bell SP, Hochwagen A (2007) Mapping of meiotic single-stranded DNA reveals double-stranded-break hotspots near centromeres and telomeres. Curr Biol 17: 2003–2012. 32. Buhler C, Borde V, Lichten M (2007) Mapping meiotic single-strand DNA reveals a new landscape of DNA double-strand breaks in Saccharomyces cerevisiae. PLoS Biol 5(12): e324. doi:10.1371/journal.pbio.0050324. 33. Symington LS, Petes TD (1988) Meiotic recombination within the centromere of a yeast chromosome. Cell 52: 237–240. 34. Jin W, Melo JR, Nagaki K, Talbert PB, Henikoff S, et al. (2004) Maize centromeres: Organization and functional adaptation in the genetic background of oat. Plant Cell 16: 571–581. 35. Zhong CX, Marshall JB, Topp C, Mroczek R, Kato A, et al. (2002) Centromeric retroelements and satellites interact with maize kinetochore protein CENH3. Plant Cell 14: 2825–36. 36. Wolfgruber TK, Sharma A, Schneider KL, Albert PS, Koo DH, et al. (2009) Maize centromere structure and evolution: Sequence analysis of centromeres 2 and 5 reveals dynamic loci shaped primarily by retrotransposons. PLoS Genet 5(11): e1000743. doi:10.1371/journal.pgen.1000743. 37. Ranere AJ, Piperno DR, Holst I, Dickau R, Iriarte J (2009) The cultural and chronological context of early holocene maize and squash domestication in the Central Balsas River Valley, Mexico. Proc Natl Acad Sci U S A 106: 5014–5018. 38. Yandeau-Nelson MD, Zhou Q, Yao H, Xu X, Nikolau BJ, et al. (2005) MuDR transposase increases the frequency of meiotic crossovers in the vicinity of a mu insertion in the maize a1 gene. Genetics 169: 917–929. 39. Brown SD, Dover GA (1980) Conservation of segmental variants of satellite DNA of Mus musculus in a related species: Mus spretus. Nature 285: 47–49. 40. Henikoff S, Ahmad K, Malik HS (2001) The centromere paradox: Stable inheritance with rapidly evolving DNA. Science 293: 1098–1102. 41. Szostak JW, Orr-Weaver TL, Rothstein RJ, Stahl FW (1983) The double-strandbreak repair model for recombination. Cell 33: 25–35.

15. Lambie EJ, Roeder GS (1986) Repression of meiotic crossing over by a centromere (CEN3) in Saccharomyces cerevisiae. Genetics 114: 769–789. 16. Nakaseko Y, Adachi Y, Funahashi S, Niwa O, Yanagida M (1986) Chromosome walking shows a highly homologous repetitive sequence present in all the centromere regions of fission yeast. EMBO J 5: 1011–1021. 17. Rockmill B, Voelkel-Meiman K, Roeder GS (2006) Centromere-proximal crossovers are associated with precocious separation of sister chromatids during meiosis in Saccharomyces cerevisiae. Genetics 174: 1745–1754. 18. Mather K (1939) Crossing over and heterochromatin in the X chromosome of Drosophila melanogaster. Genetics 24: 413–435. 19. Slatis HM (1955) A reconsideration of the brown-dominant position effect. Genetics 40: 246–251. 20. Westphal T, Reuter G (2002) Recombinogenic effects of suppressors of positioneffect variegation in Drosophila. Genetics 160: 609–621. 21. Yan H, Jin W, Nagaki K, Tian S, Ouyang S, et al. (2005) Transcription and histone modifications in the recombination-free region spanning a rice centromere. Plant Cell 17: 3227–3238. 22. Carbon J, Clarke L (1984) Structural and functional analysis of a yeast centromere (CEN3). J Cell Sci Suppl 1: 43–58. 23. Furuyama S, Biggins S (2007) Centromere identity is specified by a single centromeric nucleosome in budding yeast. Proc Natl Acad Sci U S A 104: 14706–14711. 24. Chen SY, Tsubouchi T, Rockmill B, Sandler JS, Richards DR, et al. (2008) Global analysis of the meiotic crossover landscape. Dev Cell 15: 401–415. 25. Lambie EJ, Roeder GS (1988) A yeast centromere acts in cis to inhibit meiotic gene conversion of adjacent sequences. Cell 52: 863–873. 26. Smith GP (1976) Evolution of repeated DNA sequences by unequal crossover. Science 191: 528–535. 27. Schueler MG, Dunn JM, Bird CP, Ross MT, Viggiano L, et al. (2005) Progressive proximal expansion of the primate X chromosome centromere. Proc Natl Acad Sci U S A 102: 10563–10568. 28. Ma J, Wing RA, Bennetzen JL, Jackson SA (2007) Plant centromere organization: A dynamic structure with conserved functions. Trends Genet 23: 134–9.

PLoS Biology | www.plosbiology.org

5

March 2010 | Volume 8 | Issue 3 | e1000326


Primer 1

Genomic Responses to Abnormal Gene Dosage: The X Chromosome Improved on a Common Strategy Xinxian Deng1, Christine M. Disteche1,2* 1 Department of Pathology, University of Washington, Seattle, Washington, United States of America, 2 Department of Medicine, University of Washington, Seattle, Washington, United States of America

Mechanisms to guard genomic integrity are critical to ensuring the welfare and survival of an organism. Disruptions of genomic integrity can result in aneuploidy, a large-scale genomic imbalance caused by either extra or missing whole chromosomes (chromosomal aneuploidy) or chromosome segments (segmental aneuploidy). A change in dosage of a single gene may not compromise the well-being of an organism, but the combined altered dosage of many genes due to aneuploidy disturbs the overall balance of gene expression networks, resulting in decreased fitness and mortality [1,2]. Chromosomal aneuploidy is a common cause of birth defects—Down syndrome is caused by an extra copy of Chromosome 21, and Turner syndrome by a single copy of the X chromosome in females. Furthermore, methods that detect segmental aneuploidy have uncovered small deletions or duplications of the genome in association with many disorders, such as mental retardation. Chromosomal and segmental aneuploidies are also frequent in cancer cells in which changes in copy number paradoxically increase cell fitness but are unfavorable to survival of the organism. A fundamental issue in biology and medicine is to understand the effects of aneuploidy on gene expression and the mechanisms that alleviate aneuploidy-induced imbalance of the genome. Chromosomal aneuploidy is caused by non-disjunction of chromosomes in meiosis or mitosis, while segmental aneuploidy involves breakage and ligation of DNA. In contrast, the sex chromosomes provide an example of a naturally occurring ‘‘aneuploidy’’ caused by the evolution of a specific set of chromosomes for sex determination that often differ in their copy number between males and females. For example, in mammals and in flies, females have two X chromosomes and males have one X chromosome and a Y chromosome, resulting in X monosomy in males. How does a cell or an organism respond to such different types of aneuploidy, abnormal or natural? It turns out that the overall expression level of a given gene is not necessarily in direct relation to the copy number. Unique strategies have evolved to deal with abnormal gene dosage to alleviate the effects of aneuploidy by dampening changes in expression levels. What’s more, the X chromosome has evolved sophisticated mechanisms to achieve complete dosage compensation, not surprisingly, since the copy number difference between males and females has been evolving for a long time.

increase in gene dosage from 2 to 3, due to a chromosomal gain or duplication, would produce 1.5-fold more products (Figure 1). In the second scenario, the amount of products from altered gene dosage would either equal or nearly equal that in WT cells, due to complete or partial compensation (Figure 1). Gene expression analyses of aneuploid cells or tissues in human, mouse, fly, yeast, and plant provide examples of both primary dosage effects and dosage compensation. Hence, changes in expression levels due to chromosomal aneuploidy do not affect all genes in the same manner. For example, in Down syndrome, 29% of transcripts from human Chromosome 21 are overexpressed (22% in proportion to gene dosage and 7% with higher expression), while the rest of genes are either partially compensated (56%) or highly variable among individuals (15%) [4]. Interestingly, dosage-sensitive genes, such as genes encoding transcription factors or ribosomal proteins, are more likely to be compensated to avoid harmful network imbalances [1,5]. This basal dynamic dosage compensation could be due to buffering, feedback regulation, or both, depending on the gene and the organism [4,6–9]. Buffering, a passive process of absorption of gene dose perturbations, is due to inherent non-linear properties of the transcription system. In contrast, feedback regulation is an active mechanism that detects abnormal transcript abundance and adjusts transcription levels.

Sex Chromosome-Specific Dosage Compensation Sex chromosome-specific dosage compensation evolved in response to the dose imbalance between autosomes and sex chromosomes in the heterogametic sex due to the different number of sex chromosomes between the sexes—for example, a single X chromosome and a gene-poor Y chromosome in males and two X chromosomes in females. Compensatory mechanisms that restore balance both between the sex chromosomes and autosomes and between the sexes vary among species [10,11]. In Drosophila melanogaster (fruit fly), expression from the single X Citation: Deng X, Disteche CM (2010) Genomic Responses to Abnormal Gene Dosage: The X Chromosome Improved on a Common Strategy. PLoS Biol 8(2): e1000318. doi:10.1371/journal.pbio.1000318 Published February 23, 2010 Copyright: ß 2010 Deng, Disteche. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Gene Expression Responses to Altered Dosage in Aneuploidy There are two main outcomes from altered gene dosage in aneuploidy in terms of transcript levels—either levels directly correlate with gene dosage (primary dosage effect) or they are unchanged/partially changed with gene dosage (complete or partial dosage compensation) [3]. In the first scenario, a reduction of the normal gene dosage in a wild-type (WT) diploid cell from a symbolic dose value of 2 to a value of 1 after a chromosomal loss or deletion would produce half as many gene products, while an PLoS Biology | www.plosbiology.org

Funding: This work was supported by National Institutes of Health grants GM079537 and GM046883 (to CMD). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. Abbreviations: ES, embryonic stem; MOF, males absent on the first; MSL, malespecific lethal; WT, wild-type * E-mail: cdistech@u.washington.edu

1

February 2010 | Volume 8 | Issue 2 | e1000318


chromosome–specific mechanism. The beauty of their experimental system, the S2 cell line derived from a male fly, is that it has a defined genome with numerous segmental aneuploid regions, both autosomal and X-linked. Thus, genomic responses to aneuploidy could be queried both on autosomes and on the X chromosome, the latter being associated to the MSL complex. Using secondgeneration DNA- and RNA-sequencing, the authors carefully examined the relationship between gene copy number and gene expression in S2 cells before and after induced depletion of the MSL complex. By this approach the effects of the MSL complex on the genome have effectively been separated from those triggered by a basal response to aneuploidy. What Zhang et al. have found is that partial dosage compensation of both autosomal and X-linked regions occurs even in the absence of the MSL complex. This provides strong evidence that basal dosage compensation mediated by buffering and feedback pathways allows dosage compensation across the whole genome. In the presence of the MSL complex, X-linked genes, but not autosomal genes, become subject to an additional level of regulation, which increases expression independent of gene copy or expression levels. This feed-forward regulation of the X chromosome by the MSL complex ensures a highly stable doubling of expression specific to this chromosome. Note that this feed-forward regulation results in precise dosage compensation only when X dose is half of the autosome dose, while insufficient or excessive X-linked gene expression occurs at lower or higher X dose. Excessive X expression has also been reported when ectopic expression of MSL2 is induced in Drosophila females, which leads to binding of the MSL complex to both X chromosomes and lethality [16]. The new findings by Zhang et al. implicate two levels of regulation of the X chromosome: one basal mechanism that can regulate both the X and the autosomes in the event of aneuploidy; and a second feed-forward mechanism specific to the X and regulated by the MSL complex to ensure doubling of X-linked gene expression (Figure 2). The new study proposes that the basal compensation mechanism provides a 1.5-fold increase in gene expression and the feed-forward mechanism, another 1.35-fold, resulting in a precise two-fold increase in expression of X-linked genes. The specificity of the MSL-mediated mechanism to double X-linked gene expression is ensured by the existence of DNA sequence motifs specifically enriched on the X chromosome to recruit the MSL complex only to this chromosome [14]. Autosomal aneuploidy would only trigger a response of the basal dosage compensation pathway, which would result in a 1.5-fold increase in expression of genes located within a monosomic segment (Figure 2). It should be noted that since gene expression levels were measured relative to whole genome expression (due to normalization) a fold change in expression of genes in an aneuploid segment could also be interpreted as a fold change in expression of the rest of the genome. How did such a precise mechanism evolve to ensure appropriate expression of sex-linked genes? The feed-forward process mediated by the MSL complex is a highly stable epigenetic modification selected and maintained during the evolution of heteromorphic sex chromosomes (Figure 2). Heteromorphic sex chromosomes have arisen from an ancestral pair of autosomes, following inhibition of recombination between the proto-Y chromosome that carries the male determinant and its counterpart, the proto-X chromosome [13]. Gradual loss of Ylinked genes due to lack of recombination could have happened gene-by-gene or on a chromosomal segment-by-segment basis. The human Y chromosome apparently evolved by a series of large inversions leading to a rapid loss of large chromosomal segments [17]. If the lost Y segments contained dosage sensitive

Figure 1. Expression levels change in response to altered gene dose in aneuploidy. The transcript output from a given pair of chromosomes in normal WT diploid cells is set as a value of 2. In case of aneuploidy (monosomy or trisomy), the amount of transcript would be strictly correlated with gene dose in the absence of a dosage compensation mechanism (No DC). In the presence of partial DC, the expression level per copy would be partially increased in monosomy or partially decreased in trisomy, relative to the diploid level. In the presence of complete DC, expression levels would be adjusted so that the amount of transcripts is the same in monosomic or trisomic cells compared to diploid cells. doi:10.1371/journal.pbio.1000318.g001

chromosome is specifically enhanced two-fold in males, while no such upregulation occurs in females. X upregulation also occurs in Caenorhabditis elegans (round worm) and in mammals but in both sexes [6,12]. Silencing of one X chromosome in mammalian females and partial repression of both X chromosomes in C. elegans hermaphrodites have been adapted to avoid too high an expression level of X-linked genes in the homogametic sex. A unified theme in these diverse mechanisms of sex chromosome dosage compensation is coordinated upregulation of most Xlinked genes approximately two-fold to balance their expression with that of autosomal genes present in two copies. This process utilizes both genetic and epigenetic mechanisms to increase expression of an X-linked gene once it has lost its Y-linked partner during evolution. While the mechanisms of X upregulation in mammals and worms are not clear, Drosophila X upregulation is mediated by the male-specific lethal (MSL) complex [10,13]. The MSL complex binds hundreds of sites along the male X chromosome and modifies its chromatin structure by MOF (males absent on the first)–mediated acetylation of histone H4 at lysine 16. Other histone modifications and chromatin-associated proteins, including both activating and silencing factors, are also involved in the two-fold upregulation of the Drosophila male X chromosome [14]. How these modifications coordinately work to fine-tune a doubling of gene expression is still not well understood. Moreover, the basal dynamic dosage compensation response observed in studies of autosomal aneuploidy could also play a role in Drosophila X upregulation [3]. An important question is how much this basal response to the onset of aneuploidy contributes to sex chromosome–specific dosage compensation.

Fine-Tuning of the Drosophila X Chromosome Adds a Special Layer of Regulation above a Genome-Wide Response to Aneuploidy In this issue of PLoS Biology, Zhang et al. [15] report that the exquisitely precise X chromosome upregulation in Drosophila utilizes both a basal response to aneuploidy and an X PLoS Biology | www.plosbiology.org

2

February 2010 | Volume 8 | Issue 2 | e1000318


Figure 2. Evolutionary model of sex chromosome dosage compensation compared to the basal compensation response of an autosome after a deletion. After the proto-Y chromosome evolved a gene with a male-determining function (green bar), it became subject to gradual gene loss on a gene-by-gene or segment-by-segment basis due to lack of recombination between the proto-sex chromosomes. If the lost region on the proto-Y chromosome contained dosage sensitive genes such as those that encode transcriptional factors (yellow bars), this would have triggered a basal dosage compensation response (yellow faucet) on the proto-X chromosome and led to a partial (1.5-fold) increase of expression (small arrows). The same basal dosage compensation process would also modify a deleted region on an autosome (A) in an abnormal cell. Dosageinsensitive genes (black bars) may escape this process. When broader regions were lost on the proto-Y chromosome, the collective imbalance effects of multiple aneuploid genes would have become highly deleterious and the increased load of aneuploidy could have stressed the basal mechanism of dosage compensation. Survival was achieved by recruiting regulatory complexes such as the MSL complex (red faucet) to aneuploid X segments (red regions), to further increase gene expression (big arrows) and rescue the X monosomy. This feed-forward sex chromosome–specific regulation would provide 1.35-fold increase in expression, which together with the basal dosage compensation (1.5-fold increase) would achieve the approximate two-fold upregulation of most genes on the present day X chromosome. In contrast, large-scale deleterious autosomal aneuploidy would be lost due to lack of a specific sex-driven compensatory mechanism. doi:10.1371/journal.pbio.1000318.g002

genes, this would probably have triggered a basal dosage compensation response as observed in autosomal aneuploidy (Figure 2). However, this type of dosage compensation is dynamic and incomplete, as it is probably mediated by buffering or feedback mechanisms. An organism might tolerate partial imbalances as long as those were small, but extensive gene loss from the Y chromosome would eventually have caused a deleterious collective imbalance for multiple X-linked genes. A progressive increase in the size of aneuploid X regions could have reached a threshold of unsustainable stress on the basal dosage compensation process. To relieve this stress and survive X aneuploidy, specific mechanisms of dosage compensations targeted to the X chromosome would be desirable. Such mechanisms probably derived by recruiting pre-existing regulatory complexes, for example in the making of the MSL complex in Drosophila. Indeed, one of the components of this complex is MOF, a histone acetyltransferase also involved in autosomal gene regulation [10,13]. Homologues of Drosophila MSL proteins also exist in other organisms where they are involved in gene regulation and DNA replication and repair but do not appear to associate with the X chromosome, suggesting that the PLoS Biology | www.plosbiology.org

components of X chromosome–specific complexes may differ between organisms [18]. In conclusion, two mechanisms apparently collaborate to achieve the approximate two-fold upregulation of the Drosophila X chromosome: a dynamic basal dosage compensation mechanism probably mediated by buffering and feedback processes; and a feed-forward, sex chromosome–specific regulation chiefly mediated by the MSL complex. In mammals, upregulation of the X chromosome may also result from a combination of more than one mechanism, some applicable to aneuploidy that may arise anywhere in the genome and others that evolved to control the X chromosome. High X-linked gene expression in mammalian cells with two active X chromosomes—undifferentiated female embryonic stem (ES) cells [19] and human triploid cells [20]— suggests that X upregulation does not default in these cells. Thus, in mammals, X upregulation may also be mediated by a highly stable feed-forward mechanism that acts on top of a basal aneuploidy response. In contrast, the sex chromosomes of birds and silkworms, ZZ in males and ZW in females, seem to lack a precise dosage compensation mechanism of the Z chromosome, possibly due to the absence of a feed-forward process [21,22]. The 3

February 2010 | Volume 8 | Issue 2 | e1000318


nisms underlying the basal and feed-forward regulatory pathways should help to fully understand the role of these processes in different organisms, both in response to the acute onset of aneuploidy and in evolution of sex-specific traits. Loss or dysregulation of dosage compensation mechanisms could be important in birth defects and in diseases, such as cancer, where aneuploidy is common; exploring approaches to enhance dosage compensation may be useful to relieve aneuploidy-related diseases.

Z chromosome could have a biased paucity of dosage-sensitive regulatory genes, or else selection for sexual traits may have favored the retention of gene expression imbalances between males and females. Male and female mammals display significant expression differences of a subset of genes that escape X inactivation and thus have higher expression in females [23]. Whether such genes play a role in female-specific functions is unknown. Future work to uncover the actual molecular mecha-

References 13. Vicoso B, Bachtrog D (2009) Progress and prospects toward our understanding of the evolution of dosage compensation. Chromosome Res 17: 585–602. 14. Gelbart ME, Kuroda MI (2009) Drosophila dosage compensation: a complex voyage to the X chromosome. Development 136: 1399–1410. 15. Zhang Y, Malone JH, Powell SK, Periwal V, Spana E, et al. (2010) Expression in aneuploid Drosophila S2 cells. PLoS Biol 8(2): e1000320. doi:10.1371/ journal.pbio.1000320. 16. Kelley RL, Solovyeva I, Lyman LM, Richman R, Solovyev V, et al. (1995) Expression of msl-2 causes assembly of dosage compensation regulators on the X chromosomes and female lethality in Drosophila. Cell 81: 867–877. 17. Lahn BT, Page DC (1999) Four evolutionary strata on the human X chromosome. Science 286: 877–879. 18. Rea S, Xouri G, Akhtar A (2007) Males absent on the first (MOF): from flies to humans. Oncogene 26: 5385–5394. 19. Lin H, Gupta V, Vermilyea MD, Falciani F, Lee JT, et al. (2007) Dosage compensation in the mouse balances up-regulation and silencing of X-linked genes. PLoS Biol 5: e326. 20. Deng X, Nguyen DK, Hansen RS, Van Dyke DL, Gartler SM, Disteche CM (2009) Dosage regulation of the active X chromosome in human triploid cells. PLoS Genet 5: e1000751. 21. Arnold AP, Itoh Y, Melamed E (2008) A bird’s-eye view of sex chromosome dosage compensation. Annu Rev Genomics Hum Genet 9: 109–127. 22. Zha X, Xia Q, Duan J, Wang C, He N, et al. (2009) Dosage analysis of Z chromosome genes using microarray in silkworm, Bombyx mori. Insect Biochem Mol Biol 39: 315–321. 23. Prothero KE, Stahl JM, Carrel L (2009) Dosage compensation and gene expression on the mammalian X chromosome: one plus one does not always equal two. Chromosome Res 17: 637–648.

1. Birchler JA, Riddle NC, Auger DL, Veitia RA (2005) Dosage balance in gene regulation: biological implications. Trends Genet 21: 219–226. 2. Veitia RA, Bottani S, Birchler JA (2008) Cellular reactions to gene dosage imbalance: genomic, transcriptomic and proteomic effects. Trends Genet 24: 390–397. 3. Zhang Y, Oliver B (2007) Dosage compensation goes global. Curr Opin Genet Dev 17: 113–120. 4. Ait Yahya-Graison E, Aubert J, Dauphinot L, Rivals I, Prieur M, et al. (2007) Classification of human chromosome 21 gene-expression variations in Down syndrome: impact on disease phenotypes. Am J Hum Genet 81: 475–491. 5. Edger PP, Pires JC (2009) Gene and genome duplications: the impact of dosagesensitivity on the fate of nuclear genes. Chromosome Res 17: 699–717. 6. Gupta V, Parisi M, Sturgill D, Nuttall R, Doctolero M, et al. (2006) Global analysis of X-chromosome dosage compensation. J Biol 5: 3. 7. FitzPatrick DR (2005) Transcriptional consequences of autosomal trisomy: primary gene dosage with complex downstream effects. Trends Genet 21: 249–253. 8. Stenberg P, Lundberg LE, Johansson AM, Ryden P, Svensson MJ, et al. (2009) Buffering of segmental and chromosomal aneuploidies in Drosophila melanogaster. PLoS Genet 5: e1000465. 9. Makarevitch I, Phillips RL, Springer NM (2008) Profiling expression changes caused by a segmental aneuploid in maize. BMC Genomics 9: 7. 10. Straub T, Becker PB (2007) Dosage compensation: the beginning and end of generalization. Nat Rev Genet 8: 47–57. 11. Cheng MK, Disteche CM (2006) A balancing act between the X chromosome and the autosomes. J Biol 5: 2. 12. Nguyen DK, Disteche CM (2006) Dosage compensation of the active X chromosome in mammals. Nat Genet 38: 47–53.

PLoS Biology | www.plosbiology.org

4

February 2010 | Volume 8 | Issue 2 | e1000318


Research in Translation

Epigenetic Epidemiology of Common Complex Disease: Prospects for Prediction, Prevention, and Treatment Caroline L. Relton1*, George Davey Smith2 1 Human Nutrition Research Centre, Institute of Human Genetics, Newcastle University, Newcastle upon Tyne, United Kingdom, 2 MRC Centre for Causal Analyses in Translational Epidemiology, School of Social and Community Medicine, University of Bristol, Bristol, United Kingdom

Introduction There is considerable anticipation of future improvements in disease prevention and treatment following recent advances in genomics [1]. One aspect of genomics that is receiving considerable interest is epigenetics—the regulatory processes that control the transcription of information encoded in the DNA sequence into RNA before their translation into proteins. Programmed developmental changes and the ability of the genome to register, signal, and perpetuate environmental cues are subsumed under the epigenetic banner [2]. Genes are packaged into chromatin and dynamic chromatin remodeling processes are required for the initial step in gene expression (transcription), achieved by altering the accessibility of gene promoters and regulatory regions [3]. Epigenetic factors are responsible for this regulatory process, the major components of which are DNA methylation, histone modifications, and the action of small non-coding RNAs (Figure 1). Unlike DNA sequence, which is largely fixed throughout the lifecourse, epigenetic patterns not only vary from tissue to tissue but alter with advancing age and are sensitive to environmental exposures [4–7]. It is this propensity for change that makes epigenetic processes the focus of such interest, as they lie at the interface of the environment and co-ordinated transcriptional control. In rare developmental disorders, the role of aberrant epigenetic processes is well established [8]. Our focus here, however, is on the potential role of epigenetic processes in the context of common complex disease. Tumor-specific changes in epigenetic patterns are a hallmark of numerous cancers, with analysis of the

epigenetic machinery beginning to feature prominently in emerging cancer diagnostics and therapies [9–11]. There is an increasing body of evidence to demonstrate that epigenetic patterns are altered by environmental factors known to be associated with disease risk (e.g., diet, smoking, alcohol intake, environmental toxicants, stress) [7,8]; however, an important question remains to be resolved in defining which epigenetic changes are a secondary outcome of either exposure or disease, and which lie on the causal pathway linking the two. Without proven causality, interventions to prevent or treat common complex diseases based upon epigenetic mechanisms will not be fruitful. Conversely, regardless of causality, defining a robust prospective relationship between epigenetic patterns and phenotypic traits may have application in diagnostics or in identifying highrisk individuals for non-epigenetic-based interventions.

chromatin, whereas analysis of miRNA requires a source of RNA. Planned prospective collection for such analyses is necessary, and both are costly to undertake on sizable sample sets. The Nterminal tails of the four core histones (H2A, H2B, H3, and H4) commonly exhibit post-translational modifications, including acetylation, methylation, or phosphorylation [13]. These histone modifications can be analysed following precipitation of chromatin, and subsequent use of an antibody to a specific modification e.g., methylation of histone 3, lysine 9 (H3-K9). miRNA expression levels can be measured using the same principles and methods as regular trranscriptomic analysis (miRNA array or qPCR). DNA methylation can be assayed through genomewide approaches where the investigator is interested in global changes or in identifying regions of interest [14], or targeted approaches that focus on DNA methylation at a particular locus or loci associated with genes in a specific pathway [15]. These technologies are reviewed in detail elsewhere [16]. The tissue specificity of epigenetic patterns is a well-established phenomenon, with variation between tissues within individuals being greater than variation between individuals [5]. Furthermore, epigenetic dysregulation with advancing age has been shown to be highly tissue dependent [17]. Extrapolating epigenetic information gleaned from DNA from

Measurement of Epigenetic Patterns Epigenetic patterns, including histone modifications, microRNA (miRNA), and DNA methylation, can be assessed in a range of tissue types. As DNA methylation assays on stored DNA samples are straightforward, this has been extensively studied [12]. Histone modification analysis requires that DNA is maintained as intact

Citation: Relton CL, Davey Smith G (2010) Epigenetic Epidemiology of Common Complex Disease: Prospects for Prediction, Prevention, and Treatment. PLoS Med 7(10): e1000356. doi:10.1371/journal.pmed.1000356 Published October 26, 2010 Copyright: ß 2010 Relton, Davey Smith. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: The authors received no specific funding for this article. Competing Interests: The authors have declared that no competing interests exist.

Research in Translation discusses health interventions in the context of translation from basic to clinical research, or from clinical evidence to practice.

Abbreviations: CAD, coronary artery disease; CpG, cytosine guanine dinucleotide; DNMT, DNA methyltransferase; HDAC, histone deacetylase; HNSCC, head and neck squamous cell carcinoma; LDL-C, low density lipoprotein-cholesterol; microRNA, miRNA; SNP, single nucleotide polymorphism * E-mail: c.l.relton@ncl.ac.uk Provenance: Commissioned; externally peer reviewed.

PLoS Medicine | www.plosmedicine.org

1

October 2010 | Volume 7 | Issue 10 | e1000356


Summary Points

N

N N N N

The epigenome records a variety of dietary, lifestyle, behavioral, and social cues, providing an interface between the environment and the genome. Epigenetic variation, whether genetically or environmentally determined, contributes to inter-individual variation in gene expression and thus to variation in common complex disease risk. Interventions based upon epigenetic agents, including DNA methyltransferase inhibitors and histone deacetylase inhibitors, have been in clinical use for many years, but their role outside treatment of specific cancers is not established. Epigenetic therapies will only be fruitful if epigenetic mechanisms are causally related to the disease being treated. Evidence linking epigenetic variation to specific disease phenotypes to date is lacking. Epidemiological approaches can be applied to help separate causal from noncausal associations. We propose the development of a Mendelian randomization approach (‘‘genetical epigenomics’’), which could help overcome the problems of confounding and reverse causation (when an association between epigenetic patterns and disease phenotype is observed but it is unknown whether the disease is causing changes to the epigenome or epigenetic changes are causal in disease pathogenesis).

accessible sources such as peripheral white blood or buccal cells to other tissue types is therefore problematic. The correlation between methylation patterns in different tissues is complex and locus dependent, but data that are beginning to emerge suggest that epigenetic signatures on easily accessible material such as circulating cells have potential utility as biomarkers of exposure or disease risk [18]. Epigenetic patterns are heritable across cell divisions (mitosis) [19], but undergo comprehensive but incompletely understood reprogramming during meiosis [20]. Evidence that environmental exposures can act across generations to influence epigenetic patterns in offspring exist [21], with maternal exposure to famine during the perinatal period influencing offspring DNA methylation in adulthood [22,23]. The quantitative importance of such intergenerational epigenetic transmission remains uncertain, and may have been over-emphasized in comparison with the theoretically less challenging but probably more tractable and important intra-generational epigenetic influences [24].

Environmental Influences on Epigenetic Patterns Several other factors beyond tissue type and age [4,5,17,25,26] are believed to influence epigenetic patterns. Nutritional factors modulate epigenetic marks in both animal models and humans (reviewed by [27]), with dietary sources of methyl groups, including folate, choline, betaine, methionine, and serine, which are required for DNA methylation [28,29], PLoS Medicine | www.plosmedicine.org

having been most studied. In animal and human studies these modulate epigenetic patterns in disease and non-disease settings. Other dietary components with evidence for an effect on epigenetic patterns relevant to the pathogenesis of common complex diseases include the influence of a high-fat diet on DNA methylation [30] and various dietary modifiers of histone deacetylase (HDAC) activity such as isothiocyanates, butyrate, and diallyl disulfide [31,32]. miRNA levels have also been observed to be altered following dietary modulation, with miRNA expression in human muscle being increased following a dietary challenge of essential amino acids [33]. The most widely studied lifestyle influence on epigenetic patterns is smoking. It has been associated with global hypomethylation in DNA [34] as well as genespecific hypermethylation [35] in tumor tissues in head and neck squamous cell carcinoma (HNSCC). Animal models suggest that epigenetic changes arise in lung tissue following short-term exposure to tobacco smoke condensate [36] and precede histopathological changes. Exposure to tobacco smoke is also believed to alter expression of DNA methyltransferase (DNMT) enzymes [37,38] and modulate histone modifications, including acetylation and methylation [39]. In addition, miRNAs have been proposed as modulators of smoking-induced changes in gene expression in human airway epithelium [40], and studies in rodent models have demonstrated that chemopreventive agents can protect the lung tissue from smoke exposure-induced changes in 2

miRNA expression [41]. Maternal cigarette smoking during pregnancy influences DNA methylation patterns in offspring [42,43], pointing to a vulnerability of the epigenome to environmental exposures during the intrauterine period. Animal studies have shown that chronic alcohol consumption is associated with reduced genomic DNA methylation in the colon [44], although evidence from human studies is equivocal. Alcohol-induced shifts in DNA methylation patterns could arise through perturbation of one-carbon metabolism and interference with methyl group donation (reviewed by [45]). The molecular actions of ethanol are also thought to involve site-specific changes to histone modifications, exemplified by a recent study of alcohol exposure during adolescence [46]. Epigenetic processes could also influence patterns of alcohol drinking, with emerging evidence suggesting that alcohol-sensitive miRNAs control the development of tolerance and subsequent alcohol addiction [47]. The alcoholrelated miRNA responses may in turn reflect alcohol-induced changes in DNA methylation [48]. Air pollutants such as air particulate matter and airborne benzene exposure levels have been associated with changes in DNA methylation in genes involved in inflammation and carcinogenesis [49,50]. Endocrine disruptors (vinclozilin, bisphenol A), and various heavy metals (arsenic, mercury, cadmium) are among other compounds present in the environment that have been implicated in epigenetic changes, including altered histone methylation [21]. Most epigenetic studies of environmental toxins have focused on the potential of DNA methylation patterns as biological markers of exposure rather than establishing epigenetic mechanisms as being causally related to a specific disease. Studies have, however, suggested a role for miRNAs in mediating the effects of exposure to black carbon on disease [51]. Several infectious agents, including Helicobacter pylori [52] and Epstein-Barr virus [53], have been shown to induce epigenetic changes, either directly or secondary to inflammation. Epigenetic modulation is recognized as an aetiological component in chronic inflammatory diseases such as rheumatoid arthritis and multiple sclerosis [54]. Inflammation also plays an important role in a wide range of diseases such as cancers, obesity, and atopic disorders, and epigenetic changes may be causal in disease pathogenesis [54]. There is increasing evidence that epigenetic mechanisms contribute to the transcriptional regulation of inflammatory responses [55]. October 2010 | Volume 7 | Issue 10 | e1000356


Figure 1. Epigenetic modifications. Chromosomes are composed of chromatin, consisting of DNA wrapped around eight histone protein units. Each DNA-bound histone octamer is a nucleosome. Histone tails protruding from histone proteins are decorated with modifications, including phosphorylation (Ph), methylation (Me), and acetylation (Ac). DNA molecules are methylated by the addition of a methyl group to carbon position 5 on cytosine bases when positioned adjacent to a guanine base (CpG sites), a reaction catalyzed by DNA methyltransferase enzymes. DNA methylation maintains repressed gene activity. Transcription involves the conversion of DNA to messenger RNA (mRNA), which is usually repressed by DNA methylation and histone deacetylation. mRNA is translated into a protein product, but this process can be repressed by binding of microRNA (miRNA) to mRNA. Each miRNA binds to the mRNA of up to 200 gene targets. miRNAs can also be involved in establishing DNA methylation and may influence chromatin structure by regulating histone modifiers. doi:10.1371/journal.pmed.1000356.g001

Perhaps the most widely celebrated example of the influence of environmental conditions (other than diet) on the epigenome relates to maternal postnatal nurturing and epigenetically mediated alterations to the hypothalamic-pituitary-adrenal response to stress [56]. Variations in maternal signals alter gene expression and complex behavioral phenotypes in rodent offspring through a well-defined mechanism involving the epigenetic regulation of the glucocorticoid receptor gene within the target tissue. A further example of modulation of epigenetic patterns in a target tissue is that of increased histone acetylation in human muscle biopsy tissue PLoS Medicine | www.plosmedicine.org

following exercise [57], providing evidence that chromatin remodeling might be important in mediating longer-term responses to exercise. miRNA involvement in exercise-induced changes to gene expression has also been reported [58].

Genetic Influences on Epigenetic Patterns Twin- and family-based studies have demonstrated that variation in epigenetic patterns, including both chromatin states [59] and DNA methylation [25,60,61], is heritable. Much inter-individual variation in epigenetic patterns can be explained by 3

common genetic variation [62], with a recent study estimating that 6.5% of the variance in methylation at the IGF2 (insulin-like growth factor 2) locus could be explained by five single nucleotide polymorphisms (SNPs) [63]. A genomewide association study considering DNA methylation in human brain tissue as a quantitative trait identified both cis and trans genetic effects upon DNA methylation (cytosine guanine dinucleotide [CpG]) sites, the predominant influences being in cis, defined as SNPs influencing methylation at CpG sites within 1 Mb of themselves [64]. Similar cis effects have been reported in whole blood DNA [25]. October 2010 | Volume 7 | Issue 10 | e1000356


Figure 2. Defining the causal relationship between epigenetic patterns and phenotype. Analysis of the respective relationships between DNA methylation (CpG), body mass index (BMI), and cardiovascular disease (CVD) can help to inform the direction of causality. An observed association between BMI and CpG and CpG and CVD will not decipher which of the depicted scenarios apply. doi:10.1371/journal.pmed.1000356.g002

Greater knowledge of the genetic determinants of DNA methylation, histone modifications, and miRNA activity will transform our understanding of the mechanisms involved in the establishment and maintenance of epigenetic patterns, with such genetic influences undoubtedly contributing to observed inter-individual differences in gene expression [65]. Despite the relatively large body of evidence that disease-related environmental exposures are associated with epigenetic alterations, there remains little compelling data to support the link between epigenetic variation and common complex disease phenotypes (other than cancer). Investigation of parent-of-origin effects on risk of common complex disease have suggested a role of perturbed DNA methylation [66]. Adequately powered studies relating epigenetic profiles to both exposure and disease are in their infancy, but it is highly likely that a myriad of such associations will be identified, and the major issue will be identifying meaningful and useful associations within this tsunami of data. Epigenetic measures are phenotypic, not genotypic, and as with phenotypic measures in general, non-causal associations will be the rule rather than the exception [67]. As with conventional epidemiological investigations, separating PLoS Medicine | www.plosmedicine.org

respect to behavioral factors, it has been used in a proof-of-principle manner to demonstrate associations of alcohol intake with esophageal [71] and head and neck cancers [72], as well as to considerably strengthen evidence on the associations of alcohol intake with blood pressure [73]. The method has particular promise when applied to circulating intermediate phenotypes, the manipulation of which can potentially prevent disease. Again, as proof-of-principle, an increasing number of genetic variants that are associated with low density lipoprotein-cholesterol (LDLC) level are also associated with coronary artery disease (CAD) risk [67,74–76] (Figure 3). In a similar fashion, genetic variants related to body mass index and obesity have been shown to influence a wide variety of metabolic, cardiovascular, and bone-related traits, strengthening evidence on the causal influence of adiposity in these cases [77–80]. Conversely, genetic variants associated with C-reactive protein (CRP) level have not been found to predict insulin resistance [80] or coronary heart disease [81], casting doubt on the causal role of CRP with respect to these conditions.

causal from non-causal associations will become an important task (Figure 2).

‘‘Genetical Epigenomics’’: Identifying Causal Relationships between Exposure, Epigenetic Patterns, and Disease Using germ-line genetic variation as a proxy for environmental exposures provides a route to strengthening causal inference within observational data [68– 70]. The rationale is that genetic variants are not, in general, related to the socioeconomic, behavioral, and physiological factors that confound associations in conventional observational epidemiology [67], nor are they altered by disease processes and thus subject to reverse causation. The Mendelian randomization approach can be extended to the interrogation of epigenetic variation as potential mediators of the influence of a modifiable exposure on disease outcomes, and thus appropriate targets for disease prevention. Mendelian randomization methods can be applied to many categories of environmentally modifiable exposures to help define whether their relationship with phenotype is causal. For example, with 4

Figure 3. Applying Mendelian randomization to define the causal relationship between phenotype and disease. An example based upon the report of LintelNietschke et al. (2008) [74] reporting the association between a gene variant in the LDLR gene with decreased low density lipoprotein-cholesterol (LDL-C) levels and with a reduced risk of coronary artery disease (CAD). The variant can be used in a Mendelian randomization approach to test the causal relationship between LDL-C and CAD. If LDL-C has a causal role in CAD, an association between the LDLR gene variant and disease risk would be seen (red dashed arrow). If LDLC levels are correlated with CAD risk but not causal, then the gene variant will not show an association with CAD risk. This will establish whether reverse causation is at play and remove the potential confounding influence of factors such as smoking and nutritional status. doi:10.1371/journal.pmed.1000356.g003

October 2010 | Volume 7 | Issue 10 | e1000356


In the field of gene expression studies, identifying causal processes within a multitude of associations is at least as problematic as in observational epidemiological studies. For example, the majority of gene expression signatures in adipose tissue, and in high proportions (up to 10%) in blood, have been found to be related to obesity [82]. Methods equivalent to the Mendelian randomization approach we propose here (sometimes called ‘‘genetical genomics’’ [83] in the context of gene expression studies) have been applied to separate causal transcription effects from those generated by reverse causation [82]. This is facilitated by strong cis effects on gene expression, which allows isolation of specific loci influencing transcript level. The identification of strong cis effects in a genome-wide association study analysis of methylation patterns [64] provides encouragement that these methods can be extended to investigate the causal influences of epigenetic signatures in what could be called ‘‘genetical epigenomics’’. As a hypothetical example of how this approach could be applied, we will consider alcohol intake and HNSCC. It is likely that alcohol intake would be associated with a wide range of epigenetic changes, although at least some (and probably many) of these associations could reflect confounding by the many other factors related to alcohol consumption. Similarly, HNSCC could be related to a multitude of epigenetic changes, which could arise through reverse causation (the disease influencing the epigenetic patterns) or confounding (factors associated with HNSCC risk influencing the epigenetic patterns). If the epigenetic processes are to be targeted as a component of disease prevention they must be causally associated with HNSCC, and for them to mediate the effect of alcohol intake on HNSCC risk they need to be responsive to changes in alcohol intake. Observational data demonstrating an association of alcohol intake with a particular epigenetic profile exists, but the association of this profile with HNSCC risk does not, of course, establish causality. As depicted in Figure 4, Mendelian randomization approaches could be applied to this scenario.

Epigenomic Modifiers and the Prospects for Future Treatments It can be argued that mitotically stable changes in gene expression are very likely to underlie the development of virtually all disease (in the same way as they are an essential component in the process of the PLoS Medicine | www.plosmedicine.org

Figure 4. Incorporating epigenetic information in a Mendelian randomization framework. (A) Alcohol exposure is associated with risk of head and neck squamous cell carcinoma (HNSCC) and this may be mediated by altered DNA methylation (CpG). The relationship between alcohol exposure and HNSCC is potentially confounded by factors such as socio-economic position, which correlate with both exposure and disease. A common variant in ADH1B can be used as an unconfounded, genetic proxy for alcohol exposure, and if this SNP is associated with CpG (either locally or more widely across the genome), it would lend support to the hypothesis that alcohol intake causally influences DNA methylation. However, showing associations of these epigenetic measures with HNSCC does not demonstrate causality of either alcohol or CpG on HNSCC, as either or both associations (alcoholRHNSCC and CpGRHNSCC) could be confounded or alcohol could influence HNSCC through another pathway (dashed line). (B) To investigate this, another Mendelian randomization experiment could be undertaken using an SNP known to have a cis influence on loci-specific DNA methylation. If an association were observed between this SNP and both CpG and HNSCC, this would support a role for DNA methylation in the causation of HNSCC. doi:10.1371/journal.pmed.1000356.g004

development of an organism [84]), and as definitions of epigenetics incorporate such changes, they automatically fall within the field’s remit. Once epigenetic mechanisms, even if only contributory, are unequivocally implicated in disease pathogenesis, the prospect of epigenomic-based therapies becomes a realistic possibility. A wide range of pharmacological agents that target the epigenome, including DNMT inhibitors and HDAC inhibitors, are used in clinical practice, largely as anti-cancer treatments [11]. However, these agents require further development to enhance the specificity of their pleiotropic effects, and evaluation of 5

their efficacy in a non-cancer setting is essential. Combination therapies involving DNMT inhibitors or HDACs being employed with other agents are an active avenue of inquiry. miRNAs are also emerging as a promising technology in drug development following an increasing understanding of their biogenesis and function. The links between miRNA expression and common complex disease are growing, providing a greater impetus to pursue this useful tool for the targeted modulation of gene regulation. As with other epigenetic signatures, their utility might also lie in disease diagnosis and prognosis [85]. October 2010 | Volume 7 | Issue 10 | e1000356


Five Key Papers in the Field

N

N

N

N

N

Weaver IC, Cervoni N, Champagne FA, D’Alessio AC, Sharma S, et al. (2004) Epigenetic programming by maternal behaviour. Nat Neurosci 7: 847–854. This landmark paper demonstrated that the epigenomic state of a gene can be altered through behavioural programming and that this environmentally induced modification is potentially reversible. Fraga MF, Ballestar E, Paz MF, Ropero S, Setien F, et al. (2005) Epigenetic differences arise during the lifetime of monozygotic twins. Proc Natl Acad Sci U S A 102: 10604–10609. This article describes how epigenetic patterns in monozygotic twins become more discordant with advancing age. This epigenetic drift is postulated to be invoked through differences in environmental exposures. Bjornsson HT, Sigurdsson MI, Fallin MD, Irizarry RA, Aspelund T, et al. (2008) Intra-individual change over time in DNA methylation with familial clustering. JAMA 299: 2877–2883. This study showed greater than 10% methylation change over time, that individuals within families showed both gain and loss of methylation, and that this change in methylation showed familial clustering indicative of a genetic basis. Lister R, Pelizzola M, Dowen RH, Hawkins RD, Hon G, et al. (2009) Human DNA methylomes at base resolution show widespread epigenomic differences. Nature 462: 315–322. This paper reports the first genome-wide, single base-pair resolution map of methylated cytosines in the mammalian genome from embryonic stem cell and fetal fibroblasts, showing widespread differences between the tissue types. Zhang D, Cheng L, Badner JA, Chen C, Chen Q, Luo W, et al. (2010) Genetic control of individual differences in gene-specific methylation in human brain. Am J Hum Genet 86: 411–419. This study demonstrated that DNA methylation is a heritable trait, determined in part by common genetic variation. The vast majority of genetically determined variation was observed to be in cis (correlation within 1Mb of a CpG site) with only a handful of SNPs determining trans methylation (distant regulation effects).

Conclusion Through examining the role of environmental factors in causing variation in

epigenetic patterns (exposure/epigenotype) and ultimately exploring the causal impact of epigenotype on disease outcomes (epigenotype/disease) using

genetical epigenomics and other methods, progress towards epigenetic interventions can be made. As genome-wide association studies and other approaches identify robust associations between genetic variants and epigenetic patterns, possibilities for elucidating causal pathways and predicting the effect of manipulation—through environmental (including lifestyle) modification or pharmacotherapeutic means—is considerable. In this way, epigenetic markers may become targets for modification as well as biomarkers for exposure and disease risk. The International Human Epigenome Consortium is poised to invest millions of dollars to map 1,000 reference epigenomes in a range of normal tissues and define the level of variation that exists between individuals [86]. The field of epigenetics in relation to common complex disease will undoubtedly continue to be the focus of much attention, and its progress, now that it has passed the starting line, will be followed with considerable interest.

Acknowledgments The authors would like to thank Prof Debbie A Lawlor and Dr Nick Embleton for their helpful comments on the manuscript.

Author Contributions ICMJE criteria for authorship read and met: CLR GDS. Agree with the manuscript’s results and conclusions: CLR GDS. Contributed to the writing of the paper: CLR GDS.

References 1. Feero WG, Guttmacher AE, Collins FS (2010) Genomic medicine–An updated primer. New Engl J Med 362: 2001–2011. 2. Bird A (2007) Perceptions of epigenetics. Nature 447: 396–398. 3. Vaissiere T, Sawan C, Herceg Z (2008) Epigenetic interplay between histone modifications and DNA methylation in gene silencing. Mutat Res 659: 40–48. 4. Lister R, Pelizzola M, Dowen RH, Hawkins RD, Hon G, et al. (2009) Human DNA methylaomes at base pair resolution show widespread epigenomic differences. Nature 462: 315–322. 5. Byun HM, Siegmund KD, Pan F, Weisenberger DJ, Kanel G, et al. (2009) Epigenetic profiling of somatic tissues from human autopsy specimens identifies tissue- and individual-specific DNA methylation patterns. Hum Mol Genet 18: 4808– 4817. 6. Aguilera O, Fernandez AF, Munoz A, Fraga MF (2010) Epigenetics and environment: a complex relationship. J Appl Physiol, Apr 8 [Epub ahead of print]. 7. Meaney MJ (2010) Epigenetics and the biological definition of gene x environment interactions. Child Dev 81: 41–79. 8. Nicholls RD (2000) The impact of genomic imprinting for neurobehavioural and developmental disorders. J Clin Invest 105: 413–418. 9. Sharma S, Kelly TK, Jones PA (2010) Epigenetics in cancer. Carcinogenesis 31: 27–36.

PLoS Medicine | www.plosmedicine.org

10. Laird PW (2003) The power and the promise of DNA methylation markers. Nat Rev Cancer 3: 253–266. 11. Piekarz RL, Bates SE (2009) Epigenetic modifiers: Basic understanding and clinical development. Clin Cancer Res 15: 3918–3926. 12. Beck S, Rakyan VK (2008) The methylome: approaches for global DNA methylation profiling. Trends Genet 24: 231–237. 13. Jenuwein T, Allis CD (2001) Translating the histone code. Science 293: 1074–1080. 14. Feinberg AP (2009) Genome-scale approaches to the epigenetics of common human disease. Virchows Arch 456: 13–21. 15. Campion J, Milagro FI, Martinez JA (2009) Individuality and epigenetics in obesity. Obes Rev 10: 383–392. 16. Tollefsbol TO (2004) Methods of epigenetic analysis. In Tollefsbol TO. Epigenetics protocols. Secaucus (New Jersey) Springer Science & Business Media. pp 1–8. 17. Thompson RF, Atzmon G, Gheorghe C, Liang HQ, Lowes C, et al. (2010) Tissue specific dysregulation of DNA methylation in aging. Aging Cell, May 22 [Epub ahead of print]. 18. Talens RP, Boomsa DI, Tobi EW, Kremer D, Jukema JW, et al. (2010) Variation, patterns and temporal stability of DNA methylation: considerations for epigenetic epidemiology. FASEB J 9: 3135–3144.

6

19. Kim JK, Samaranayake M, Pradhan S (2009) Epigenetic mechanisms in mammals. Cell Mol Life Sci 66: 596–612. 20. Reik W, Dean W, Walter J (2001) Epigenetic reprogramming in mammalian development. Science 293: 1089–1093. 21. Bollati V, Baccarelli A (2010) Environmental epigenetics. Heredity 105: 105–112. 22. Heijmans BT, Tobi EW, Stein AD, Putter H, Blauw GJ, et al. (2008) Persistent epigenetic differences associated with prenatal exposure to famine in humans. Proc Natl Acad Sci U S A 105: 17046–17049. 23. Tobi EW, Lumey LH, Talens RP, Kremer D, Putter H, et al. (2009) DNA methylation differences after exposure to prenatal famine are common and timing- and sex-specific. Hum Mol Genet 18: 4046–4053. 24. Haig D (2007) Weismann rules! OK? Epigenetics and the Lamarckian temptation. Biol Philos 22: 415–428. 25. Boks MP, Derks EM, Weisenberger BJ, Strengman E, Janson E, et al. (2009) The relationship of DNA methylation with age, gender and genotype in twins and healthy controls. PLoS ONE 4: e6767. doi:10.1371/ journal.pone.0006767. 26. Calvanese V, Lara E, Kahn A, Fraga MF (2009) The role of epigenetics in ageing and age-related diseases. Ageing Res Rev 8: 268–276.

October 2010 | Volume 7 | Issue 10 | e1000356


27. Ferguson LR (2009) Epigenetic variation and customising nutritional intervention. Curr Pharmacogenomics Person Med 7: 115–124. 28. Kim KC, Friso S, Choi SW (2009) DNA methylation, an epigenetic mechanism connecting folate to healthy embryonic development and aging. J Nutr Biochem 20: 917–926. 29. Waterland RA (2006) Assessing the effects of high methionine intake on DNA methylation. J Nutr 136 (6 Suppl): 1706S–1710S. 30. Widiker S, Karst S, Wagener A, Brockman GA (2010) High fat diet leads to a decreased methylation of the Mc4r gene in the obese BFMI and the lean B6 mouse lines. J Appl Genet 51: 193–197. 31. Delage B, Dashwood RH (2008) Dietary manipulation of histone structure and function. Annu Rev Nutr 28: 347–366. 32. Myzak MC, Dashwood RH (2006) Histone deacetylases as targets for dietary cancer preventive agents: lessons learned with butyrate, diallyl disulfide and sulforaphane. Curr Drug Targets 7: 443–452. 33. Drummond MJ, Glynn EL, Fry CS, Dhanani S, Volpi E, et al. (2009) Essential amino acids increase miRNA-499, -208b and -23 in human skeletal muscle. J Nutr 139: 2279–2284. 34. Hsiung DT, Marsit CJ, Houseman EA, Eddy K, Furniss CS, et al. (2007) Global DNA methylation level in whole blood as a biomarker in head and neck squamous cell carcinoma. Cancer Epidemiol Biomarkers Prev 16: 108–114. 35. Kaur J, Demokan S, Tripathi SC, Macha MA, Begum S, et al. (2010) Promoter hypermethylation in indian primary oral squamous cell carcinoma. Int J Cancer. E-pub ahead of print. 5 April 2010. doi:10.1002/ijc.25377. 36. Philips JM, Goodman JI (2009) Inhalation of cigarette smoke induces regions of altered DNA methylation (RAMs) in SENCAR mouse lung. Toxicology 260: 7–15. 37. Launay JM, Del Pino M, Chironi G, Callebert J, Peoc’h K, et al. (2009) Smoking induces longlasting effects through a monoamine-oxidase epigenetic regulation. PLoS ONE 4: e7959. doi:10.1371/journal.pone.0007959. 38. Liu H, Zhou Y, Boggs SE, Belinsky SA, Liu J (2007) Cigarette smoke induces demethylation of prometastatic oncogene synuclein-gamme in lung cancer cells by downregulation of DNMT3B. Oncogene 26: 5900–5910. 39. Hussain M, Rao M, Humphries AE, Hong JA, Liu F, et al. (2009) Tobacco smoke induces polycomb-mediated repression of Dickkopf-1 in lung cancer cells. Cancer Res 69: 3570– 3578. 40. Schembri F, Sridhar S, Perdomo C, Gustafson AM, Zhang X, et al. (2009) MicroRNAs as modulators of smoking-induced gene expression changes I human airway epithelium. Proc Natl Acad Sci U S A 106: 2319–2324. 41. Izzotti A, Larghero P, Cartiglia C, Longobardi M, Pfeffer U, et al. (2010) Modulation of microRNA expression by budesonide, phenethyl isothiocyanate and cigarette smoke in mouse liver and lung. Carcinogenesis 31: 894–901. 42. Guerrero-Preston R, Goldman LR, BrebiMieville P, Ili-Ganga C, Lebron C, et al. (2010) Global hypomethylation is associated with in utero exposure to cotinine and perfluorinated alkyl compounds. Epigenetics, Aug 14 [Epub haead of print]. 43. Breton CV, Byun HM, Wenten M, Pan F, Yang A, et al. (2009) Prenatal tobacco smoke exposure affects global and gene-specific DNA methylation. Am J Respir Crit Care Med 180: 462–467. 44. Sauer J, Jang H, Zimmerly EM, Kim KC, et al. (2010) Agening, chronic alcohol consumption and folate are determinnats of genomic DNAmethylation, p16 promoter methylation and the expression of p16 in the mouse colon. Br J Nutr 104: 24–30.

PLoS Medicine | www.plosmedicine.org

45. Seitz HK, Stickel F (2007) Molecular mechanisms of alcohol-mediated carcinogenesis. Nat Rev Cancer 7: 599–612. 46. Pascual M, Boix J, Felipo V, Guerri C (2009) Repeated alcohol administration during adolescence causes changes in the mesolimbic dopaminergic and glutamatergic systems and promotes alcohol intake in the adult rat. J Neurochem 108: 920–31. 47. Miranda RC, Pietrzykowski AZ, Tang Y, Sathyan P, Mayfield D, et al. (2010) MicroRNAs: master regulators of ethanol abuse and toxicity? Alcohol Clin Exp Res 34: 575–87. 48. Tarantini L, Bonzini M, Apostoli P, Pegoraro V, Bollati V, et al. (2009) Effects of particulate matter on genomic DNA methylation content and iNOS promter methylation. Environ Health Perspect 117: 217–222. 49. Bollati V, Baccarelli A, Hou L, Bonzini M, Fustinoni S, et al. (2007) Changes in DNA methylation patterns in subjects exposed to lowdose benzene. Cancer Res 67: 876–880. 50. Baccarelli A, Wright RO, Bollati V, Tarantini L, Litonjua AA, et al. (2009) Rapid DNA methylation changes after exposure to traffic particles. Am J Respir Crit Care Med 179: 572–578. 51. Wilker EH, Baccarelli A, Suh H, Vokonas P, Wright RO, et al. (2010) Black carbon exposures, blood pressure and interactions with single nucleotide polymorphisms in microRNA processing genes. Environ Health Perspect 118: 943–948. 52. Shin CM, Kim N, Jung Y, Park JH, Kang GH, et al. (2010) The role of Helicobacter pylori infection in aberrant DNA methylation along multistep gastric carcinogenesis. Cancer Sci. E-pub ahead of print. 18 February 2010. doi:10.1111/j.1349-7006.2010.01535.x. 53. Tsai CN, Tsai CL, Tse KP, Chang HY, Chang YS (2002) The Epstein-Barr virus oncogene product, latent membrane protein 1, induces the downregulation of E-cadherin gene expression via activation of DNA methyltransferases. Proc Natl Acad Sci U S A 99: 10084–10089. 54. Backdahl L, Bushell A, Beck S (2009) Inflammatory signalling as mediator of epigenetic modulation in tissue-specific chronic inflammation. Int J Biochem Cell Biol 41: 176–84. 55. Medzhitov R, Horng T (2009) Transcriptional control of the inflammatory response. Nature Rev Immunol 9: 692–703. 56. Weaver IC, Cervoni N, Champagne FA, D’Alessio AC, Sharma S, et al. (2004) Epigenetic programming by maternal behaviour. Nat Neurosci 7: 847–854. 57. McGee SL, Fairlee E, Graham AP, Hargreaves M (2009) Exercise-induced histone modifications in human skeletal muscle. J Physiol 587: 5951–5981. 58. Radom Aizik S, Zaldivar FP, Jr., Oliver SR, Galassetti PR, Cooper DM (2010) Evidence for microRNA involvement in exercise-associated neutrophil gene expression changes. J Appl Physiol 109: 252–261. 59. Kadota M, Yang HH, Hu N, Wang C, Hu Y, et al. (2007) Allele-specific chromatin immunoprecipitation studies show genetic influence on chromatin state in human genome. PLoS Genet 3: e81. doi:10.1371/journal.pgen.0030081. 60. Wong CC, Caspi A, Williams B, Craig IW, Houts R, et al. (2010) A longitudinal study of epigenetic variation in twins. Epigenetics 5: 516–526. 61. Bjornsson HT, Sigurdsson MI, Fallin MD, Irizarry RA, Aspelund T, et al. (2008) Intraindividual change over time in DNA methylation with familial clustering. JAMA 299: 2877–2883. 62. French HJ, Attenborough R, Hardy K, Shannon F, Williams RBH (2009) Interindividual variation in epigenomic phenomena in humans. Mamm Genome 20: 604–611. 63. Heijmans BT, Kremer D, Tobi EW, Boomsa DI, Slagboom PE (2007) Heritable rather than agerelated and stochastic factors dominate variation

7

64.

65.

66.

67.

68.

69.

70.

71.

72.

73.

74.

75.

76. 77.

78.

79.

80.

81.

in DNA methylation of the human IGF2/H19 locus. Hum Mol Genet 16: 547–554. Zhang D, Cheng L, Badner JA, Chen C, Chen Q, et al. (2010) Genetic control of individual differences in gene-specific methylation in human brain. Am J Hum Genet 86: 411–419. Dimas AS, Dermitzakis ET (2009) Genetic variation of regulatory systems. Curr Opin Genet Dev 19: 586–590. Kong A, Steinthorsdottir V, Masson G, Thorleifsson G, Sulem P, et al. (2009) Parental origin of sequence variants associated with complex diseases. Nature 462: 868–874. Davey Smith G, Lawlor DA, Harbord R, Timpson N, Day I, et al. (2007) Clustered environments and randomized genes: a fundamental distinction between conventional and genetic epidemiology. PLoS Med 4: e352. doi:10.1371/journal.pmed.0040352. Davey Smith G, Ebrahim S (2003) ‘Mendelian randomization’: can genetic epidemiology contribute to understanding environmental determinants of disease? Int J Epidemiol 32: 1–22. Davey Smith G (2010) Mendelian randomization for strengthening causal inference in observational studies: applications to gene by environment interaction. Perspect Psychol Sci. In press. Sheehan NA, Didelez V, Burton PR, Tobin MD (2008) Mendelian randomization and causal inference in observational epidemiology. PLoS Med 5: e177. doi:10.1371/journal.pmed.0050177. Lewis SJ, Davey Smith G (2005) Alcohol, ALDH2 and esophageal cancer: a meta-analysis which illustrates the potentials and limitations of a Mendelian randomization approach. Cancer Epidemiol Biomarkers Prev 14: 1967–1971. Boccia S, Hashibe M, Galli P, De Feo E, Asakage T, et al. (2009) Aldehyde dehydrogenase 2 and head and neck cancer: a meta-analysis implementing a Mendelian randomization approach. Cancer Epidemiol Biomarkers Prev 18: 248–254. Chen L, Davey Smith G, Harbord R, Lewis S (2008) Alcohol intake and blood pressure: a systematic review implementing Mendelian randomization approach. PLoS Med 5: e52. doi:10.1371/journal.pmed.0050052. Linsel-Nitschke P, Gotz A, Erdmann J, Braenne I, Braund P, et al. (2008) Lifelong reduction of LDL-cholesterol related to a common variant in the LDL-receptor gene decreases the risk of coronary artery disease–a Mendelian randomization study. PLoS ONE 3: e2986. doi:10.1371/ journal.pone.0002986. Teslovich M, Musunumu K, Smith AV, Edmondson AC, Stylianou IM, et al. (2010) Biological, clinical and population relevance of 95 loci for blood lipids. Nature 466: 707– 713. Schuldiner AR, Pollin TI (2010) Variation in blood lipids. Nature 466: 703–704. Freathy RM, Timpson NJ, Lawlor DA, Pouta A, Ben-Shlomo Y, et al. (2008) Common variation in the FTO gene alters diabetes-related metabolic traits to extent expected, given its effect on BMI. Diabetes 57: 1419–1426. Timpson N, Harbord R, Davey Smith G, Zacho J, Tybaerg-Hansen A, et al. (2009) Does greater adiposity increase blood pressure and hypertension risk? Mendelian randomization using Fto/ Mc4r genotype. Hypertension 54: 84–90. Timpson NJ, Sayers A, Davey Smith G, Tobias JH (2002) How does body fat influence bone mass in childhood? A Mendelian randomisation approach. J Bone Miner Res 24: 522–533. Timpson NJ, Lawlor DA, Harbord RM, Gaunt TR, Day INM, et al. (2005) C-reactive protein and its role in metabolic syndrome: Mendelian randomisation study. Lancet 366: 1954–1959. Zacho J, Tybjoerg-Hansen A, Jensen JS, Grande P, Sillensen H, et al. (2008) Genetically elevated C-reactive protein and ischaemic vascular disease. New Engl J Med 359: 1897–1908.

October 2010 | Volume 7 | Issue 10 | e1000356


82. Emilsson V, Thorleifsson G, Zhang B, Leonardson AS, Zink F, et al. (2008) Genetics of gene expression and its effect on disease. Nature 452: 423–428. 83. Li H, Lu L, Manly KF, Chesler EJ, Bao L, et al. (2005) Inferring gene transcriptional modulatory

PLoS Medicine | www.plosmedicine.org

relations: a genetical genomics approach. Hum Mol Genet 14: 1119–1125. 84. Gilbert SF, Epel D (2009) Ecological developmental biology: Integrating epigenetics, medicine and evolution. MA, USA: Sinauer Associates Inc.

8

85. Liu Z, Sall A, Yang D (2008) MicroRNA: an emerging therapeutic target and intervention tool. Int J Mol Sci 9: 978–999. 86. Abbott A (2010) Project set to map marks on genome. Nature 463: 596–597.

October 2010 | Volume 7 | Issue 10 | e1000356


Spatial Epigenetic Control of Mono- and Bistable Gene Expression Ja´nos Z. Kelemen, Prasuna Ratna, Simone Scherrer, Attila Becskei* Institute of Molecular Life Sciences, University of Zurich, Zurich, Switzerland

Abstract Bistability in signaling networks is frequently employed to promote stochastic switch-like transitions between cellular differentiation states. Differentiation can also be triggered by antagonism of activators and repressors mediated by epigenetic processes that constitute regulatory circuits anchored to the chromosome. Their regulatory logic has remained unclear. A reaction–diffusion model reveals that the same reaction mechanism can support both graded monostable and switch-like bistable gene expression, depending on whether recruited repressor proteins generate a single silencing gradient or two interacting gradients that flank a gene. Our experiments confirm that chromosomal recruitment of activator and repressor proteins permits a plastic form of control; the stability of gene expression is determined by the spatial distribution of silencing nucleation sites along the chromosome. The unveiled regulatory principles will help to understand the mechanisms of variegated gene expression, to design synthetic genetic networks that combine transcriptional regulatory motifs with chromatin-based epigenetic effects, and to control cellular differentiation. Citation: Kelemen JZ, Ratna P, Scherrer S, Becskei A (2010) Spatial Epigenetic Control of Mono- and Bistable Gene Expression. PLoS Biol 8(3): e1000332. doi:10.1371/journal.pbio.1000332 Academic Editor: Andre Levchenko, Johns Hopkins University, United States of America Received October 14, 2009; Accepted February 9, 2010; Published March 16, 2010 Copyright: ß 2010 Kelemen et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: This work is supported by the Swiss National Foundation and by the UZH-URPP. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. Abbreviations: GA, gene activation * E-mail: attila.becskei@imls.uzh.ch

or repressor-recruiting DNA sequences that act or interact over long distances, variously termed as long-range repressors, silencing proteins, and silencers in different systems and organisms [14–17]. Genes exposed to the antagonism of activators and repressors or silenced chromosomal regions have been frequently observed to display binary response [13,18–21]. Although regulatory principles underlying the graded and binary responses generated by networks with spatially homogeneously distributed components have been increasingly elucidated, the quantitative aspects of the behavior of epigenetic circuits anchored to the chromosome have remained unclear. We examined whether the spatial distribution of activator and repressor binding sites influences gene expression to become monostable or bistable. We examined long-range interactions between these sites. Since long intervening DNA sequences can receive signals from endogenous cellular pathways, we used heterologous synthetic gene expression systems precluding pleiotropic cellular effects. Synthetic networks have been instrumental in reconstituting nonmonotonous responses and in revealing the basic principles of binary response and bistability in transcriptional regulatory networks based on feedback or competition of activators and repressors [19,22–26]. We identified a concise nonlinear reaction–diffusion equation that explains gene expression of a large number of genetic constructs with different configurations. We found that binary response is not inherent to repressor proteins exhibiting synergy over long distances. Both graded and binary responses can arise depending on the spatial distribution of the binding sites of the repressors along the DNA.

Introduction Graded and switch-like responses reflect fundamental aspects of the functioning of regulatory networks. A graded, monostable response enables the faithful propagation of a signal, and it is often the default response of simple pathways, but regulatory mechanisms can improve the linearity and the dynamic range of the graded response [1,2]. Conversely, when the signal strength reaches a threshold value, the switch-like response is often manifested in ON and OFF states within a cell population. This binary response can be induced by positive feedback loops capable of generating bistability, but many other mechanisms can support it by rendering the underlying processes more nonlinear and stochastic [3–9]. Positive feedback loops in transcriptional or protein kinase networks have been increasingly recognized as a driving force of cellular differentiation [10,11]. The components of these networks are dissolved in the cytoplasm or nucleoplasm, and typically have a spatially homogeneous distribution. In contrast, inhomogeneously distributed regulatory components are frequently observed in eukaryotic transcriptional regulation. Binding of eukaryotic transcriptional factors—activators and repressors—to the DNA can lead to recruitment of enzymes and structural proteins of opposing functions, that induce structural changes and covalent modifications of chromatin, exemplified by acetylation and methylation [12,13]. This leads to a spatially inhomogeneous distribution of regulators along the DNA, constituting the epigenetic code. Activators loosen the chromatin structure. Conversely, the compaction of chromatin and heterochromatin formation are typically induced by repressors PLoS Biology | www.plosbiology.org

1

March 2010 | Volume 8 | Issue 3 | e1000332


Chromosomal Epigenetic Regulatory Circuits

concentration. The constant nucleation of silencing proteins is necessary for the establishment of steady-state concentration profiles of silencing proteins around the nucleation sites (Figure S1). Silencing proteins and their cofactors spread along the chromosome, whereby nonspecific protein DNA interactions can facilitate their sliding, a process described by one-dimensional diffusion [18,27–30]. The diffusivity, D(x, c), itself is a variable because the silencing proteins, in particular Sir3, can bridge neighboring DNA segments and condense the chromatin in a concentration-dependent manner, leading the heterochromatin formation [27]. Consequently, the superimposed concentration gradient becomes steeper, accelerating the flux of silencing proteins. Thus, D(x, c) was approximated by   DAc, so that the L Lc DA c . This non-Fickian diffusion term was expressed as Lx Lx diffusion term arises in models where diffusional clustering or condensation of particles is described [31,32]. The reaction term represents an autocatalytic loop based on processes encompassing the cooperative binding of Sir3p and Sir4p, mutual binding of Sir2p, Sir3p, and Sir4p, deacetylation of chromatin by Sir2p creating higher affinity sites for Sir3p and Sir4p, and polymerization of Sir3p proteins [18,27,33–36].

Author Summary In the simplest scenario, a gene is expressed when an activator protein binds to its regulatory sequence, and is silenced when the regulatory sequence is bound by a repressor. Many genes are regulated by both activators and repressors, with the response determined by the combined influence of both factors. When the response is monostable graded, expression is finely tuned to a level that reflects the proportion of the bound activator to the bound repressor. Monostable graded systems allow cells to respond precisely to stimuli. If the response is bistable, the response of each cell depends on whether the activator or the repressor wins. Bistable regulation results in the same gene being expressed in some cells and silenced in others, an outcome that promotes cellular differentiation. It remains unclear, however, how different genetic regulatory structures code for monostable graded and bistable responses. We modeled mathematically the behavior of repressors as they bind to and spread their inhibitory effect along genes and found that the spatial distribution of the binding sites determines which response is chosen. If repressors bind both upstream and downstream of the coding sequence, the response is bistable. If they bind only to one side of the coding sequence, the response is monostable. We confirmed our theoretical findings using synthetic genetic constructs in yeast. These findings help to explain how variations in the location of regulatory elements can lead to cellular differentiation and adaption to varying environments.

rðcÞ~L

It is assumed that the autocatalytic association of the silencing proteins is superimposed onto a basal, nonspecific association, occurring at a rate of b. The former is represented by a Hill function, where L stands for the maximal association rate in the limit of c R ‘. The dissociation of the silencing proteins is a linear process, and occurs at a rate of kd. Initial conditions with uniformly distributed low and high concentrations were used to reflect biochemical fluctuations in the initial accumulation of the silencing proteins (Figure 1D). The simulation of the reaction–diffusion model (Equation 1) revealed that when two silencing nucleation sites were positioned into sufficient proximity, the two initial conditions gave rise to two distinct solutions representing two concentration profiles (Figures 1D and S2). The lowconcentration profile was composed of two isolated gradients around the silencing nucleation sites. The high-concentration profile represented a synergistic interaction of the two nucleation sites (Figure 1E).

Results Bistable Synergistic Interaction of Silencing Gradients Silencing is efficiently induced when multiple silencers interact [14]. To mimic this architecture, we inserted binding sites for the silencing protein Sir3p (in the form of a fusion protein) both downstream and upstream of a gene reporter construct, in the model organism Saccharomyces cerevisiae. When recruited to these dual recruitment constructs, Sir3p evoked a variegated GFP expression at intermediate levels of gene activation (GA) with a bimodal distribution of cellular fluorescence (Figure 1A and 1B). When GA was enhanced, all of the cells switched from the OFF to the ON expression state; so that the ON state was affected only by a residual repression (Figure 1A). Thus, a small change in the input generated a large change in the output. The ON and OFF cell populations represent a simple form of cellular differentiation. To understand the principles of this form of differentiation, we built a mathematical model based on realistic molecular processes. Due to the complexity and incomplete description of these processes, we sought to identify key mechanisms that can account for bistability in the dual recruitment constructs. The changes in the concentration of the silencing protein at a given point of the space-time, c(x, t), are governed by source s(x), reaction r(c), and nonlinear diffusion terms (Figure 1C, Table S1, and Text S1).   Lc L Lc ~rðcÞzsðxÞz Dðx,cÞ Lt Lx Lx

Stability Diagram of Gene Expression as a Function of Transcriptional Activation The coexistence of two concentration profiles for the same parameter values is in accord with the co-occurrence of ON and OFF cells at intermediate GA (Figures 1A, 1E, and S3). For a more detailed analysis of bistability, the gene expression has to be calculated from the concentration profiles. Gene expression is determined jointly by transcriptional activation and silencing. Quantitatively, gene expression is defined as the product of GA and fold inhibition due to silencing (see also Materials and Methods). Transcriptional activators not only induce gene expression, but also reduce the spreading of silencing proteins because activators recruit enzymes that relax the structure of chromatin, diminishing the slope of the superimposed concentration gradient [37]. Furthermore, the recruited histone acetyltransferases decrease the number of the available highaffinity binding sites for the silencing proteins [18,33]. Therefore, the diffusivity was set to be inversely proportional to GA, DA = D0˙ KGA/(KGA + GA). Fold inhibition was equated with the concentration of silencing proteins at the gene regulatory region,

ð1Þ

The nucleation term, s(x), represents the recruitment of the silencing proteins, and it is a rectangular function. Its width, sw, is proportional to the number of tet operators, while the height of the rectangle, sh, is proportional to the amount of the silencing proteins recruited to the operators. Thus, sh is a function of the doxycycline PLoS Biology | www.plosbiology.org

cn {kd czb Kzcn

2

March 2010 | Volume 8 | Issue 3 | e1000332


Chromosomal Epigenetic Regulatory Circuits

Figure 1. Reaction窶電iffusion model of bistable repression. (A) In the dual recruitment construct, tetR-Sir3p, denoted as R, binds to the [tetO]2 and [tetO]4 operators upstream and downstream of the reporter gene, respectively, in the absence of doxycycline. Repression is relieved after addition of d = 2 mM doxycycline, which dissociates tetR from the tet operators. Gene expression is activated by the estradiol (e)-inducible GEV, denoted as A. The fluorescence value represents the mean of the fitted Gaussian distribution of the cell fluorescence. The area of the circle reflects the proportions of the ON and OFF cells when the distribution was bimodal. (B) Fluorescence and DIC merged images of cells expressing [tetO]2-GFPT-YFP-[tetO]4 regulated by tetR-Sir3p. Cells were induced by e = 11 nM in the absence of doxycycline. (C) The steps involved in the reaction窶電iffusion model (from top to bottom): nucleation, autocatalytic recruitment, and nonlinear diffusion. The S-shaped distortion of the DNA symbolizes the aggregation of the silencing proteins. (D) Evolution of the simulated concentration distributions of silencing proteins along a DNA segment nucleated at two sites. The top and bottom panels show the convergence of the profiles to the steady state representing the low and high silencing states, respectively. The corresponding initial conditions were c(x, 0) = 2 and 4. The following parameters were used for Equation 1: L = 5, K = 7, n = 2, kd = 1, b = 0.01, and DA = 1, sh = 4, and sw = 0.057. The internucleation distance was 1.2 kb. (E) The low (gray continuous line) and the high (red dashed line) concentration profiles represent the long-term solution (200 time units after the initiation) of the model as specified in (D) to reflect the steady state. The blue lines denote the nucleation sites. (F) The two solutions overlap when silencing was nucleated at a single site, calculated as in (E), indicating that the solution is monostable (gray-red dashed line). doi:10.1371/journal.pbio.1000332.g001

PLoS Biology | www.plosbiology.org

3

March 2010 | Volume 8 | Issue 3 | e1000332


Chromosomal Epigenetic Regulatory Circuits

where xu and xd correspond to the positions 20.38 kb and 0 kb, respectively (Figure 2A). When DA was high due to the weak GA, simulations initiated with both conditions converged to the synergistically interacting

assuming a linear relation between them. Since repression from the upstream and downstream sites interact multiplicatively [38]: fold inhibition{1~ðcðxu Þz1Þðcðxd Þz1Þ{1

ð2Þ

Figure 2. Prediction of gene expression based on the concentration profiles of silencing proteins. The values of the parameters are given in Table S1, unless otherwise indicated. (A) Inhibition of gene expression, expressed as fold inhibition 2 1, was calculated from the values of the silencing concentration gradient at the positions xu = 20.38 and xd = 0 kb (yellow dots), which span the transcriptional regulatory region of the gene (Equation 2). The upstream point, xu, corresponds approximately to the region of the activator binding sites while the downstream point, xd, corresponds to the transcriptional initiation site. These points were chosen as plausible sites of action of silencing proteins. The silencing nucleation sites are positioned at 20.6 and 0.6 kb in the dual nucleation setting. (B) The upwards and downwards arrows represent the solutions initiated with low (c(x, 0) #2) and high (c(x, 0) $4) starting concentrations, for the [O]2-Gene-[O]4 setting. When the solutions converge, the two arrows merge into an arrow with two arrowheads (monostable region). Double arrows represent weighed mean values of the two solutions to reflect the population average in the bistable region. The red and blue arrows represent solutions with DA = D0 ˙ KGA / (KGA + GA) and DA = D0 ˙ KGA / 1.36 ˙ (KGA + GA), respectively. The reduction of diffusivity for the blue arrows reflects the effect of the transcriptional activators bound to the downstream sites that do not contribute to GA. (C) GA reflects the ratio of expression at the applied estradiol concentration to that at maximal induction (200 nM estradiol), in the absence of repression (d = 2 mM). Fold inhibition 2 1 at the applied estradiol concentration reflects the change in gene expression when the repressor binds to the recruitment site (see Materials and Methods). Fold inhibition 2 1 was measured for the [tetO]2-GFP-[tetO]4 (red symbols) and the [tetO]2-GFP-GALUAS-[tetO]4 (blue symbols) constructs when the fluorescence distributions were unimodal (o) or bimodal (‘). The insertion of the GALUAS did not increase the maximal expression of the construct relative to the control constructs (unpublished data). (D) Calculations performed for the Gene-[O]4 setting as in (B). (E) Fold inhibition 2 1 was measured for the GFP-[tetO]4 (red symbols) and the GFP-GALUAS [tetO]4 (blue symbols) constructs, as in (B). doi:10.1371/journal.pbio.1000332.g002

PLoS Biology | www.plosbiology.org

4

March 2010 | Volume 8 | Issue 3 | e1000332


Chromosomal Epigenetic Regulatory Circuits

segment (Figure 1E). To test this lateral amplification, we compared the inhibition of gene expression when Sir3p was recruited downstream of GFP either to a single site or to two sites separated by a 1-kb-long transcription unit, expressing Cherry (Figure 3A). Indeed, the efficiency of inhibition was stronger by a factor of three for the dual recruitment construct in comparison to the single recruitment construct (Figure 3B), suggesting that the model adequately describes the shape of the gradient. The lateral amplification is predicted to be stronger when DA is high (compare Figures 1E, 1F, and S5). The detection of lateral amplification in the convergent transcription constructs (Figure 3A) may have been facilitated by the presence of two terminators separating the GFP and Cherry genes, because silencing, and possibly the spreading of silencing proteins, can be enhanced by transcriptional terminators [38,39].

concentration profiles. Correspondingly, gene expression was inhibited strongly. In contrast, the inhibition was weak when GA was strong (Figure 2B). At intermediate activation, the strongly and weakly inhibited states co-occurred. In summary, increasing GA is accompanied by a transition from the monostable OFF to the monostable ON state through a bistable region, creating a characteristic bifurcation diagram (stability within the mono- and bistable terms refers to the number of steady states) (Figure 2B). The bifurcation diagram was in accordance with the transitions observed for the silenced gene expression as the GA was varied experimentally, recapitulating a classical binary response (Figures 1A and 2C). The model can be validated when further activator binding sites are inserted between the two silencing nucleation sites in a way that they do not contribute to gene expression (Figure 2C). In this case, the model predicted that the bifurcation diagram would not change qualitatively; only the respective stability regions would shift toward the lower GA levels since the diffusion of silencing proteins is further diminished (Figure 2B). We tested this prediction by inserting activator binding sites between the terminator of the reporter gene and the tet operators, where they do not activate gene expression (Figure 2C). Indeed, bimodal expression was observed for a lower range of GA (Figures 2C and S4). In the above model, the reduction of DA between the silencing nucleation sites was spatially uniform. We compared this simple model with a more complex one, in which the reduction of DA was more pronounced in the proximity of the activator binding sites. The solutions of the two models were in qualitative agreement (Figure S5).

Critical Nucleation Lengths Are Required for Synergistic Bistable Response Bistable systems can undergo bifurcations with respect to multiple parameters. Therefore, we explored the stability of predicted gene expression as a function of the width of the nucleation sites. The above simulations represented systems with two operators upstream and four operators downstream of the reporter gene (Figure 2B). When the width of the downstream nucleation site was halved, the bistable response persisted: the synergistic monostable, the bistable, and the low monostable concentration profiles alternated as gene expression increased (Figure 4A). Indeed, the experiments utilizing the [tetO]2-GFP[tetO]2 construct evidenced the bimodal gene expression at intermediate GA and strong average repression (Figure 4C). When the width of both nucleation segments was halved relative to the previous setting, bistability collapsed, and only the lowconcentration profiles were seen over the entire range of GA (Figure 4B). In the corresponding experiments, the number of tet

Lateral Amplification of Silencing Gradients Whereas the predicted concentration gradient is strongly amplified between the two nucleation sites, a moderate amplification was also predicted for outside of the internucleation

Figure 3. Lateral amplification of silencing gradients. (A) The lateral amplification of silencing gradients can be read out with constructs, in which GFP expression is repressed either by a single downstream cluster of recruitment sites, [tetO]2, or by two downstream clusters of recruitment sites separated by a transcription unit, [tetO]2-Cherry-[tetO]2. (B and C) Fold inhibition 2 1 was measured for GFP expression for the GFP-[tetO]2 and the GFP [tetO]2-Cherry-[tetO]2 constructs. The ratio of the inhibition strengths (see Materials and Methods) of the dual recruitment constructs to that of the single recruitment constructs was 3.2 6 0.31 and 1.77 6 0.31 for Sir3p (B) and Sum1p (C), respectively. doi:10.1371/journal.pbio.1000332.g003

PLoS Biology | www.plosbiology.org

5

March 2010 | Volume 8 | Issue 3 | e1000332


Chromosomal Epigenetic Regulatory Circuits

Figure 4. Stability of gene expression and inhibition strength as the function of the number and distribution of nucleation sites. (A and B) Concentration profiles calculated for the [O]2-Gene-[O]2 (A) and [O]1-Gene-[O]1 (B) settings. The red dashed and gray continuous lines represent the solutions initiated with the two initial conditions. The two solutions overlap when GA is either weak or strong (thin and thick red-gray dashed lines). At intermediate GA, two distinct solutions evidenced the bistability (medium red dashed and gray dashed lines) for [O]2-Gene-[O]2. (C) Inhibition strength at single (upstream or downstream) and dual recruitment constructs. The inhibition strength is the average value for fold inhibition 2 1 in the [0.06, 0.6] interval of GA. The total number of tet operators is indicated for each dual recruitment construct [tetO]1-GFP-[tetO]1 (n = 2), [tetO]1-GFP-[tetO]2 (n = 3), [tetO]2-GFP-[tetO]2 (n = 4), and [tetO]2-GFP-[tetO]4 (n = 6). Empty symbols stand for constructs that display bimodal gene expression. doi:10.1371/journal.pbio.1000332.g004

Sum1-1p. This variant was identified in order to efficiently substitute Sir-dependent silencing, and it has a capability to induce pronounced heterochromatin formation [43,44]. Indeed, Sum1-1p displayed a stronger synergy than Sum1p (Figure S7), and bimodal expression was observed even up to 16 h after induction (Figure S6). We examined whether Sir3p and Sum1p interacted with the native HML I silencer synergistically. The Sir proteins are recruited to both the E and I silencers, which flank the heterochromatic HML genes, whereas Sum1p is recruited to the E silencer only [40]. The I silencer alone did not have an inhibitory effect on gene expression (Figure S8) [42]. When the reporter gene was flanked by an upstream I silencer and by downstream tet operators, both tetR-Sir3p and tetR-Sum1p induced bimodal gene expression at intermediate GA (Figures 5D, 5E, and S9). When the reporter gene was lengthened in the dual recruitment constructs, the synergistic and bistable inhibition of gene expression by Sum1p was abolished (Figure S10). This confirms that in addition to the critical nucleation strength, the two nucleation sites have to be within a critical distance to generate synergistic interaction of the silencing gradients (Figure S5). In summary, we observed similar responses for four different combinations of silencers and repressor proteins (Figures 2B, 3, and 5), suggesting that they follow the same regulatory principle

operators was reduced. The resulting [tetO]1-GFP-[tetO]1 construct displayed weak silencing and monostable gene expression (Figure 4C), confirming that synergistic interaction of gradients occurs only when the nucleation widths reach a certain threshold.

The Bistable Response Is Conserved for Repressors Exhibiting Long-Range Synergy A model of a biological dynamical system can be corroborated by replacing a network component with a functionally similar component. For this purpose, we tested the Sum1p repressor that binds to the E silencer of the HML heterochromatic locus and contributes to gene silencing [40]. Its cofactor, Hst1p, is a homolog of the silencing protein Sir2p [41]. When Sum1p was recruited as a tetR-Sum1p fusion protein to tet operators, it inhibited expression of GFP, independently of whether the tet operators were positioned upstream or downstream of the reporter gene (Figure 5A). When bound to both of these sites, Sum1p inhibited gene expression in a strong, synergistic way (Figure 5A). The synergistic interaction over long distance is a phenomenon typical of silencers and repressors acting at heterochromatic loci [14,42]. At intermediate GA, expression of GFP was bimodal (Figure 5C), similar to the observations with Sir3p. The bimodal expression was observed up to 8 h after induction of gene expression (Figure S6). We also examined a well-characterized mutant form of Sum1p, PLoS Biology | www.plosbiology.org

6

March 2010 | Volume 8 | Issue 3 | e1000332


Chromosomal Epigenetic Regulatory Circuits

Figure 5. Repression by Sum1p displays long-range synergy and evokes bimodal gene expression in the dual recruitment constructs. The symbols in the fold inhibition plots correspond to those used in Figure 2. (A) tetR-Sum1p was recruited to [tetO]2-GFP, GFP-[tetO]4, and [tetO]2-GFP-[tetO]4 constructs. The gray dashed line represents calculated multiplicative interaction of repression from upstream and downstream sites. Fold inhibition 2 1 at GA = 0.2 was 4.8 times higher for the dual recruitment construct in comparison to the multiplicative effect, confirming a strong synergy. (B) Calculations performed for the [O]2-Gene, Gene-[O]4, and [O]2-Gene-[O]4 setting as in Figure 2B. (C) tetR-Sum1p is recruited to the dual recruitment construct [tetO]2-GFP-[tetO]4. The fluorescence value represents the mean of the fitted Gaussian distribution of the cell fluorescence. The area of the circle reflects the proportions of the ON and OFF cells when the distribution was bimodal. (D) The inhibition strength at the I silencer-GFP-[tetO]4 constructs was 5.91 6 0.91 and 7.34 6 2.37 times higher for tetR-Sum1p and -Sir3p, respectively, than that at the parent GFP-[tetO]4 constructs. (E) Cellular fluorescence distributions due to the expression of the I silencer-GFP-[tetO]4 construct repressed by tetRSum1p. Dots are experimental data obtained after adaptive binning, while the lines are fits using two Gaussian distributions. The cells were induced by 1.5, 5.8, 8, 11, and 200 nM estradiol (denoted by black, blue, green, orange, and red colors, respectively), d = 0 mM. AU, arbitrary units. doi:10.1371/journal.pbio.1000332.g005

in a graded way to the binding of Sir3p to upstream regions of promoters containing up to seven operators (Figures 4C and 6D). Monostable graded response was also observed for the entire range of GA when tetR-Sum1p and tetR-Sir3p bound to four sites downstream of reporter genes (Figures 2E and 5A). The insertion of activator binding sites in-between the terminator of the reporter gene and downstream operators alleviated the inhibition of gene expression (Figure 2D and 2E), similar to the case for the dual recruitment constructs (Figure 2B and 2C). None of the above single recruitment constructs with operators clustered to a single chromosomal segment displayed bimodal gene expression. However, they all inhibited gene expression less than the dual recruitment constructs displaying synergistic inhibition of gene expression (Figure 4C). Thus, we hypothesized that bistability was not observed because the inhibition strength did not reach a critical value. In other words, the possibility cannot be excluded

that associates the synergistic interaction of repressors over large distances with bistability (Figure 5B).

Synergistic Repressors Generate Monostable Graded Response When Their Binding Sites Are Clustered in a Single Chromosomal Segment Surprisingly, when the silencing proteins were nucleated at a single segment, only one solution emerged using the same parameter values that generated bistability with the dual nucleation setting (Figure 1F and S2). This gradient generated by the single nucleation site was identical with the nonsynergistic solution of the dual nucleation setting (Figure 1E and 1F). Even when the single nucleation segment was broadened, the concentration profiles rose, but they remained monostable over the entire range of GA (Figure 6A–6C). Indeed, expression was monostable and responded PLoS Biology | www.plosbiology.org

7

March 2010 | Volume 8 | Issue 3 | e1000332


Chromosomal Epigenetic Regulatory Circuits

Figure 6. A single cluster of silencing nucleation sites generates a graded, monostable response. (A) The concentration profiles were calculated when GA was set to 0.022 and silencing was nucleated at 20.6 kb. The nucleation site comprised one, two, four, and seven operators. The blue lines denote the width of the [O]7 nucleation site. The gray continuous and red dashed lines represent the simulated solutions initiated with low and high concentrations, c(x, 0), respectively. When they overlap, the system is monostable (red-gray dashed lines). (B) The concentration profiles were calculated for [O]7 as in (A), but GA was varied. (C) Gene expression was calculated from (B) by setting the maximal value of unrepressed gene expression to 1 (see Materials and Methods), so that the black, blue, and red lines correspond to a GA of 0.01, 0.15, and 0.43, respectively. A lognormal distribution was assigned to each calculated mean value. (D) Cellular fluorescence distributions due to the expression of [tetO]7-GFP, repressed by tetR-Sir3p (YSSD227.4) The cells were induced by 2.9, 5.7, 11, 22, 32, and 200 nM estradiol, in the absence of doxycycline. doi:10.1371/journal.pbio.1000332.g006

structs, whereas binary expression was found when two nucleation sites flanked a gene. The OFF and ON cells reflect the effect of the synergistically interacting and isolated silencing gradients, respectively (Figures 1E, 1F, and 4A). Thus, the ON cells are inhibited to a degree comparable to the repression of single nucleation constructs when GA is strong (Figure 5A–5C), whereas the OFF cells are inhibited synergistically. A further exploration of the model revealed a high degree of plasticity of system behavior depending on the parameter values. In particular, the dual nucleation setting generated a graded response when the cooperativity of binding of silencing proteins was reduced (Figure S12). Furthermore, the single nucleation setting displayed bistability when the ratio of the diffusivity to the nucleation width was reduced. In the latter case, however, the silencing proteins did not propagate to long distances due to the low diffusivity, and consequently, they may have no or little impact on gene expression (Figure S13). It remains to be determined whether epigenetic silencing processes exist that assume such parameter values and display behaviors reproducing the above predictions.

that if silencing nucleated at a single cluster inhibited gene expression strongly enough, then the response would be binary. Therefore, we searched for single recruitment constructs with strong inhibitory potential. Fortuitously, when the tet operators were inserted between the activator binding sites and the TATA box, a strong inhibition of expression by both Sum1p and Sir3p was observed. In particular, Sum1p inhibited gene expression more strongly when bound to these intercalated operators in comparison to when Sum1p repressed gene expression synergistically in the dual recruitment constructs (Figure 7A). However, gene expression responded in a graded way over a broad range of activator and repressor binding when Sum1p or Sir3p bound to the intercalated operators (Figure 7B, 7C, and S11). In contrast, the dual recruitment constructs displayed bimodal gene expression when the binding of the activator and repressor was balanced (Figure 7C). The region of bistability was broader for Sir3p in comparison to Sum1p (Figure 7C), in accordance with the stronger synergistic repression and lateral amplification of the gradient by Sir3p (Figure 3B and 3C) [38]. Thus, our experiments confirmed the predictions of the reaction–diffusion model, revealing that the same mechanism can support both graded and binary gene expression depending on the spatial distribution of silencing nucleation sites. Monostable graded expression was characteristic of single nucleation conPLoS Biology | www.plosbiology.org

Discussion Eukaryotic transcriptional cis regulation governs developmental and differentiation programs [45]. Long-range interaction between 8

March 2010 | Volume 8 | Issue 3 | e1000332


Chromosomal Epigenetic Regulatory Circuits

Figure 7. Graded responses can be generated by both Sum1p and Sir3p even when they strongly repress gene expression. (A) PGAL1tetO2 corresponds to the GAL1 promoter, in which the Mig1p binding sites, positioned between the GALUAS and the TATA box, were replaced by tet operators. Fold inhibition of gene expression of the respective constructs was obtained for unimodal (o) and bimodal (‘) distributions. (B) Gaussians were fit to fluorescence distributions induced by 3.75, 7.5, 15, 30, and 200 nM estradiol, in the absence of doxycycline. AU, arbitrary units. (C) The means of the fitted Gaussians are color coded. When the distributions were bimodal, the squares were split into two triangles of different colors. The cells were induced by 3.75, 7.5, 15, 30, and 200 nM estradiol and 0, 10, 20, 40, 80,160, and 2,000 nM doxycycline. doi:10.1371/journal.pbio.1000332.g007

transcription factors makes the deciphering of the logic of this regulation difficult [16,17,46]. Whereas long-range interactions can occur even in prokaryotes through looping of the intervening DNA sequences, the long-range effects of eukaryotic activators (enhancers) and repressors (silencers) are often mediated by cofactors that spread along the chromatin, modifying its composition and conformation. Therefore, eukaryotic transcriptional cis regulation requires complex spatiotemporal models to understand its logic. PLoS Biology | www.plosbiology.org

We have devised a concise reaction–diffusion model that captures the important molecular aspects of long-range synergistic repression: autocatalytic recruitment of proteins and their spreading along the DNA that is accompanied by aggregation and condensation of chromatin. We presented a number of experimental tests that confirmed the model predictions. The central result of the model is that the response type depends on the distribution silencing nucleation sites. When two clusters of 9

March 2010 | Volume 8 | Issue 3 | e1000332


Chromosomal Epigenetic Regulatory Circuits

nucleation sites flank a gene, the system is bistable. For the corresponding genetic constructs, stochastic gene expression with ON and OFF cells was observed. On the other hand, a monostable graded response was generated when silencing was nucleated at a single cluster even if it was relatively long. Both types of distributions of recruitment clusters for repressors and silencing proteins have been encountered in the genome. An increasing number of promoters have been identified that are dynamically regulated by a single group of binding sites for longrange repressors even within euchromatic regions [41,47,48]. In such cases, monostable graded expression is expected to be generated by repressors that follow the regulatory mechanisms we identified. On the other hand, the synergistic interaction of two or more silencers scattered through telomeric and subtelomeric regions is thought to be required for efficient heterochromatin formation in a broad range of organisms, including yeasts and the mammalian X chromosome [14]. The identification of such silencers is hampered by the fact that in isolation, they lose their silencing capability or may even activate gene expression, so a large number of protosilencers may be hidden in the genome [14]. Genes flanked by two or more silencers are expected to display a stochastic binary expression. Indeed, genes positioned to subtelomeric domains frequently display bimodal and stochastic gene expression in response to environmental stimuli [20,21,49]. For example, cell adhesion proteins are localized to subtelomeric domains and are expressed in a variegated way. This phenotypic diversity may enhance the survival and virulence of fungal cells [20,21]. Conversely, position-effect variegation, a phenomenon characterized by stochastic bimodal expression of a gene positioned to the silenced domains of the chromosome, can arise due to chromosomal aberrations and lead to developmental abnormalities and diseases [50–52]. Interaction between multiple silencing gradients can also contribute to correlations in the stochastic fluctuations of expression of genes ordered along the chromosome [53,54]. Components or mechanisms employed in silencing are often conserved between yeast and higher organisms [33]. Long-range repression and heterochromatin formation can be efficiently reconstituted by tethering the appropriate proteins (or RNA) to

the chromosome in different organisms [19,34,55,56]. Therefore, well-defined genetic systems comparable to ours can be employed to examine if the regulatory logic we unveiled is evolutionarily conserved. Our results highlight a difference between signal transducers dissolved in the cell protoplasm and regulatory circuits anchored to the chromosome. Dissolved kinases or transcription factors produce either a monostable or bistable response in a single cell depending on whether they are constitutively regulated or embedded in feedback loops (Figure 8A). In contrast, the same long-range repressor can evoke a monostable graded response at one gene but can induce stochastic transitions between ON and OFF states at another gene (Figure 8B). The outcome is determined by the distribution and density of the recruitment sites of silencing proteins and activators. The dissolved cellular regulatory networks and the spatially inhomogeneously distributed chromosomal epigenetic circuits will jointly determine gene expression and stability of cellular differentiation states [54,57–59]. Knowing the regulatory principles of the latter will certainly help to decipher their interaction and to understand how they shape cellular functioning.

Materials and Methods Strain Construction and Growth Conditions The expression of GFP from chromosomally integrated constructs was activated by GEV, an estradiol (e)-inducible transcriptional activator, when bound to the GALUAS, and was repressed by tetR fusion proteins (Tables S2 and S3). tetR dissociates from the tet operators in the presence of doxycycline (d), and repression was relieved at d = 2 mM. GEV is integrated into the genome into the MRP7 locus; having five copies in the resulting YSSH208. The plasmids containing the tetR-Sir3p and tetR-Sum1p constructs were integrated into the RET2 locus. The GFP reporter constructs were integrated into the YFR054c locus, unless otherwise specified. Cells containing inducible gene expression constructs were grown for 4 h after induction in minimal media, until a cell density of OD600 = 0.4–0.8.

Figure 8. Control modes of dissolved and anchored regulatory circuits. (A) A regulator dissolved in the protoplasm under constitutive or autocatalytic control can trigger either a graded or binary response in a cell population. (B) A regulator anchored to the chromosome can trigger both graded and binary responses at different genes (black-green rectangles) within a single cell. doi:10.1371/journal.pbio.1000332.g008

PLoS Biology | www.plosbiology.org

10

March 2010 | Volume 8 | Issue 3 | e1000332


Chromosomal Epigenetic Regulatory Circuits

the normalization of the coefficients a and b to the sum of 100, na + nb = 100. Thus, the sample sizes of M, A, and B were set empirically to correspond to percentages. Next, we performed statistical comparisons between the means of the metapopulations using two-sample t-test with unequal variance. When the difference, mM 2 m1 (m1 ,m2,) was significant (a = 1024), the distribution m(x) was considered bimodal.

Analysis of Mean Expression Values Cellular fluorescence Fe,d, was measured by flow cytometry. Total fluorescence of at least 5,000 cells was measured using flow cytometry. Five to 15% of the total cell population was selected in the forward-scatter versus side-scatter plot to measure GFP fluorescence of cells with similar size. GA is the uninhibited expression at a given estradiol concentration normalized by the maximally induced uninhibited expression (e = 0.2 mM, d = 2 mM). GAe,2 ~

Supporting Information Figure S1 Simulated evolution of concentration of silencing proteins in the absence of persistent nucleation, sh = 0. An initial pulse was provided in the form of c(x, 0) = 6 within the segment 20.6, x ,0.6 kb. DA = 0.64. The initiated accumulation of silencing proteins dissipates after around 15 time units, indicating that a constant source of silencing proteins is needed for the maintenance of concentration profiles in the range of parameter values used in our simulations. Found at: doi:10.1371/journal.pbio.1000332.s001 (0.12 MB TIF)

Fe,2 {FC F0:2,2 {FC

FC is the background fluorescence of the cells. Fold inhibition is the ratio of the unrepressed expression to the repressed expression (typically at d = 0), at a given degree of activation. FIe,d ~

Fe,2 {FC Fe,d {FC

Figure S2 Simulated concentration distribution of silencing proteins along a DNA segment with coarse spatial discretization. To account for the compartmental nature of chromatin, we employed the method of finite difference to simulate the model (Equation 1). For the Euler discretization of space and time, the space steps were sized according to the length of the nucleosome (0.16 kb) to ensure the numerical stability of the procedure, the time step was considerably smaller than the space step. The simulation ran to reach 200 time units, similar to the simulations employing the FEM. The concentration profiles are comparable to those in Figure 1E and 1F, using the same kinetic parameters, except for D0 = 0.5, sh = 4; sw had to be extended to 0.16 kb, because this is the minimal nucleation width using the coarse spatial discretization. The steady-state concentration profiles were obtained by extending the data points to lines (as with the zero-order hold procedure) to better illustrate the coarseness of the space resolution. Found at: doi:10.1371/journal.pbio.1000332.s002 (0.19 MB TIF)

Thus, normalized gene expression is the product of GA and fold inhibition at given concentrations of estradiol and doxycycline. Typically, the OFF cells had fluorescence levels very close to the cellular fluorescence background, which implies that the values of fold inhibition 2 1 calculated for the OFF cells after histogram fitting are associated with large measurement errors. For this reason, we calculated fold inhibition 2 1 for the entire cell population, which has a higher fluorescence value. The inhibition strength is the average value for fold inhibition 2 1 in the interval GA = [0.06, 0.6]. Error bars represent standard deviations calculated from three experiments.

Histogram Fitting and Bimodality Detection The logarithmic cellular fluorescence intensities of more than 30,000 cells were extracted from list mode files. The data were subjected to an adaptive binning algorithm [60] to determine the number of bins, and hence, a sampled function of the distribution. A mixture of two Gaussians (Equation 3) was then fitted to each discrete distribution using nonlinear regression.

2



x{m 2 2s22

Figure S3 Parameter dependence of the switch-like transition. The surface represents the bistable region, which separates the ON and OFF expression states. L, K, and n were varied in the range [0.5, 10], [0.5, 10], and [1,3], respectively, with steps of 0.5 units each. The rest of the parameters were kept constant at the same values as used for the dual nucleation model in Figure 1E. Two long-term solutions were calculated, using the low and high initial conditions, to determine the occurrence of bistability. The surface was extrapolated from the points corresponding to parameter triplets (L, K, n) that give rise to bistability. Note, that for n = 1 (lack of cooperativity), bistability did not occur. Found at: doi:10.1371/journal.pbio.1000332.s003 (0.24 MB TIF)

2

ðx{m1 Þ 2 ae 2s1 be mðxÞ~ pffiffiffiffiffiffi z pffiffiffiffiffiffi 2ps1 2ps2

ð3Þ

Finally, the data were transformed from the log space into the linear space. To systematically detect bimodality in a distribution, we performed the following procedure. The fluorescence distribution was first normalized to a mean of zero, mM = 0, and standard deviation of 1, sM = 1, and then subjected to binning and regression, as previously described. Subsequently, we considered three metapopulations for the further analysis. The first metapopulation corresponded to the measured events (M), with mM = 0 and sM = 1, since the distribution had been normalized. The population size was normalized to 100. The two remaining metapopulations, denoted A and B, represented the two fitted Gaussian components (Equation 3) with the mean and variance parameters (mi, si2) resulting from the nonlinear regression, whereas the respective population sizes na and nb resulted from PLoS Biology | www.plosbiology.org

Figure S4 Cellular fluorescence distributions due to the expression of the [tetO]2-GFP-GALUAS-[tetO]4 construct repressed by tetR-Sir3. The cells (PRY524.1) were induced by 2.1, 4.1, 8, 16, and 200 nM estradiol in the absence of doxycycline. Found at: doi:10.1371/journal.pbio.1000332.s004 (0.19 MB TIF) Figure S5 Comparison of the concentration profiles with uniform and nonuniform diffusivities within the [O]2-Gene-[O]4 setting. GA reduces the spreading of the silencing proteins, which can be mediated by histone acetylation, and by the activator-induced transcription that disrupts heterochromatin. The former process is expected to reduce diffusivity around the activator binding sites, whereas the latter reduces 11

March 2010 | Volume 8 | Issue 3 | e1000332


Chromosomal Epigenetic Regulatory Circuits

diffusivity along the entire gene. In the main simulations, the diffusion coefficient was reduced uniformly in the segment flanked by the nucleation sites to imitate reduction of diffusivity along the entire gene (see also [A, C, and E]). For comparison, we simulated concentration profiles when the diffusivity was reduced nonuniformly, around the activator binding sites (B, D, and F). The results are comparable using the two approaches. (A) DA was reduced uniformly as GA was increased in-between the nucleation sites, whereas outside of this region, D0 = 0.64. Curves represent the functions DA = 0.52, 0.36, and 0.24. (B) The nonuniform distribution is given by DA(x) = D0N(1+fsN(m, s2))21 where N (m,s2) denotes the Gaussian distribution with mean m and variance s2. m was set to 20.38 kb, which corresponds to the activator binding site, while s equals the internucleation distance divided by four. D0 = 0.64. GA was increased by setting f to 1.5, 6, and 12. (C and D) The red dashed and gray continuous lines represent the solutions initiated with low and high starting concentrations. The internucleation distance was 1.2 kb. (E and F) Simulations as performed in (C) and (D), but the internucleation distance was increased to 1.5 kb. Consequently, the synergistic interaction between the two gradients was abolished. Found at: doi:10.1371/journal.pbio.1000332.s005 (0.55 MB TIF)

[GFP]2, GFP-T-YFP, GFP-T-lacZ integrated within the respective strains: YJKD-16, 23.4, 23.5, 23.6). The relative inhibition denotes the inhibition strength (see Materials and Methods) of the dual recruitment constructs normalized using the [tetO]2-GFP construct. The inhibition strength is the average value of the fold inhibition 2 1 interpolated on the interval GA = [0.06, 0.6]. Error bars represent standard deviations calculated from three experiments. (B and C) Cellular fluorescence distributions due to the expression of the [tetO]2-[GFP]2-[tetO]4 (B) and [tetO]2-GFP-TlacZ-[tetO]4 (C) constructs repressed by Sum1p. The cells were induced by 0, 4.1, 5.8, 16, and 200 nM estradiol, in the absence of doxycycline. No bimodal response was detected for the [tetO]2GFP-T-lacZ-[tetO]4 construct. Found at: doi:10.1371/journal.pbio.1000332.s010 (0.36 MB TIF) Cellular fluorescence distributions when expression is repressed by Sum1p. The cells (YJKD21.2.2 and YJK16) were induced by 3.75, 7.5, 15, 30, and 200 nM estradiol, in the absence of doxycycline. Found at: doi:10.1371/journal.pbio.1000332.s011 (0.24 MB TIF)

Figure S11

Monostable concentration profiles arise when cooperativity in the positive feedback loop is small. The Hill coefficient was reduced from 2 to n = 1.5. The following parameters were used for the simulations: sh = 6, L = 5, K = 5, b = 0.01, and kd = 1. The internucleation distance was 1.2 kb for the [O]2-Gene-[O]2 setting. (A) The red dashed and gray continuous lines represent the solutions initiated with low and high starting concentrations. The blue lines delimit the nucleation sites. When the two concentration profiles overlap red–gray dashed lines are visible. Monostable concentration profiles were obtained even at intermediate GA. (B) Inhibition of gene expression, expressed as fold inhibition 2 1, was calculated from the values of the silencing concentration gradients. Even though there is no bistability at intermediate GA, a sigmoidal change in fold inhibition can be seen in this range. Found at: doi:10.1371/journal.pbio.1000332.s012 (0.25 MB TIF) Figure S12

Long-term changes in the cellular fluorescence distributions due to the expression of the [tetO]2GFP-[tetO]4 construct repressed by Sum1p or Sum11p. The cells were induced by 0, 8, 11.3, 22, and 200 nM estradiol (denoted by black, blue, green, orange, and red colors, respectively), in the absence of doxycycline. Cells were grown exponentially for the period (8 h or 16 h) indicated. Bimodal expression can be seen 16 h after induction by 11.3 nM estradiol due to silencing by Sum1-1p. Found at: doi:10.1371/journal.pbio.1000332.s006 (0.45 MB TIF) Figure S6

Figure S7 Synergy of repression by Sum1-1p.

Sum1-1p is the T988I mutant form of Sum1p. tetR-Sum1-1p was recruited to [tetO]2-GFP (DHS43), GFP-[tetO]4 (DHS44), and [tetO]2GFP-[tetO]4 (DHS45) constructs. The gray dashed line represents calculated multiplicative interaction of repression from upstream and downstream sites. Fold inhibition 2 1 at GA = 0.2 was 13.1 times higher for the dual recruitment construct in comparison to the multiplicative effect, indicating a very strong synergy (see also Figure 5A). Found at: doi:10.1371/journal.pbio.1000332.s007 (0.16 MB TIF)

Figure S13 Bistable concentration profiles are confined to the proximity of the nucleating segment when diffusivity is low relative to the nucleation width. The following parameters were used for the simulations: sh = 0.3, L = 5, K = 7, b = 0.01, and kd = 1 for a [O]20-Gene setting. DA was set to the indicated values uniformly between the boundaries of the simulation. The blue lines delimit the nucleation segment, sw = 0.741 kb. The widening of the nucleation segment and reduction of the diffusivity renders the spatial aspect of the reaction–diffusion system less pronounced. Consequently, the behavior of the systems approximates that of a simple (nonspatial) positive feedback loop that generates bistability. The yellow dots denote the concentrations at 20.38 and 0 kb, which determine the level of GA. (A and B) The red dashed and gray continuous lines represent the solutions initiated with low and high starting concentrations with DA = 0.2 (A) and 0.6 (B). Bistable solution is obtained for lower diffusivity, DA = 0.2. It is evident that the silencing proteins do not propagate to long distances relative to the width of the nucleation segment and the concentrations of the silencing proteins at the gene regulatory region (yellow dots) are low even for the high-concentration profile. Thus, they have an effect on gene expression only in the vicinity of the nucleating segment. (C) The magnified version of the low-concentration profiles is displayed for DA = 0.2 (thin line) and 0.6 (thick line). It is evident that the concentration profile obtained for the lower diffusivity is more square-like. Found at: doi:10.1371/journal.pbio.1000332.s013 (0.29 MB TIF)

Figure S8 The I silencer alone does not repress the reporter gene. The expression induced by GEV at the I silencer-GFP-[tetO]4 construct (PRY544.1, 2545.1) was not lower than that at the GFP-tetO4 construct (YJK15), in nonrepressive conditions (tetR-Sum1p and tetR-Sir3p do not repress expression in the presence of 2 mM doxycycline). Thus, the I silencer alone does not repress the reporter gene; it has rather a weak activatory potential. Found at: doi:10.1371/journal.pbio.1000332.s008 (0.23 MB TIF) Figure S9 Cellular fluorescence distributions due to the expression of the I-silencer-GFP-tetO construct repressed by tetR-Sir3. The cells (PRY544.1) were induced by 1.5, 5.8, 8, 11, and 200 nM estradiol, in the absence of doxycycline. Found at: doi:10.1371/journal.pbio.1000332.s009 (0.19 MB TIF) Figure S10 Collapse of bimodal expression as the distance between the recruitment sites for tetR-Sum1 is increased. (A) Sum1p was recruited to the dual recruitment constructs enclosing reporter genes of varying lengths (GFP, PLoS Biology | www.plosbiology.org

12

March 2010 | Volume 8 | Issue 3 | e1000332


Chromosomal Epigenetic Regulatory Circuits

Table S1 Constants used in the equations. Found at: doi:10.1371/journal.pbio.1000332.s014 (0.04 MB DOC)

Acknowledgments We thank Melanie Anding for technical help, Walter Schaffner for helpful discussion, and Bernhard Dichtl for comments on the manuscript.

Table S2 Strains. Found at: doi:10.1371/journal.pbio.1000332.s015 (0.06 MB DOC)

Author Contributions The author(s) have made the following declarations about their contributions: Conceived and designed the experiments: AB. Performed the experiments: JZK PR. Analyzed the data: JZK AB. Contributed reagents/materials/analysis tools: SS. Wrote the paper: JZK AB.

Table S3 Plasmids.

Found at: doi:10.1371/journal.pbio.1000332.s016 (0.05 MB DOC) Text S1 Supporting text and references. Found at: doi:10.1371/journal.pbio.1000332.s017 (0.05 MB DOC)

References 1. Nevozhay D, Adams RM, Murphy KF, Josic K, Balazsi G (2009) Negative autoregulation linearizes the dose-response and suppresses the heterogeneity of gene expression. Proc Natl Acad Sci U S A 106: 5123–5128. 2. Takahashi S, Pryciak PM (2008) Membrane localization of scaffold proteins promotes graded signaling in the yeast MAP kinase cascade. Curr Biol 18: 1184–1191. 3. Ferrell JE, Jr., Bhatt RR (1997) Mechanistic studies of the dual phosphorylation of mitogen-activated protein kinase. J Biol Chem 272: 19008–19016. 4. Blake WJ, Balazsi G, Kohanski MA, Isaacs FJ, Murphy KF, et al. (2006) Phenotypic consequences of promoter-mediated transcriptional noise. Mol Cell 24: 853–865. 5. Paliwal S, Iglesias PA, Campbell K, Hilioti Z, Groisman A, et al. (2007) MAPKmediated bimodal gene expression and adaptive gradient sensing in yeast. Nature 446: 46–51. 6. Kim SY, Ferrell JE, Jr. (2007) Substrate competition as a source of ultrasensitivity in the inactivation of Wee1. Cell 128: 1133–1145. 7. Burnett JC, Miller-Jensen K, Shah PS, Arkin AP, Schaffer DV (2009) Control of stochastic gene expression by host factors at the HIV promoter. PLoS Pathog 5: e1000260. doi:10.1371/journal.ppat.1000260. 8. Ansel J, Bottin H, Rodriguez-Beltran C, Damon C, Nagarajan M, et al. (2008) Cell-to-cell stochastic variation in gene expression is a complex genetic trait. PLoS Genet 4: e1000049. doi:10.1371/journal.pgen.1000049. 9. Kalmar T, Lim C, Hayward P, Munoz-Descalzo S, Nichols J, et al. (2009) Regulated fluctuations in Nanog expression mediate cell fate decisions in embryonic stem cells. PLoS Biol 7: e1000149. doi:10.1371/journal.pbio.1000149. 10. Muzzey D, van Oudenaarden A (2006) When it comes to decisions, myeloid progenitors crave positive feedback. Cell 126: 650–652. 11. Macarthur BD, Ma’ayan A, Lemischka IR (2009) Systems biology of stem cell fate and cellular reprogramming. Nat Rev Mol Cell Biol 10: 672–681. 12. Rice KL, Hormaeche I, Licht JD (2007) Epigenetic regulation of normal and malignant hematopoiesis. Oncogene 26: 6697–6714. 13. Hutchins AS, Mullen AC, Lee HW, Sykes KJ, High FA, et al. (2002) Gene silencing quantitatively controls the function of a developmental trans-activator. Mol Cell 10: 81–91. 14. Fourel G, Lebrun E, Gilson E (2002) Protosilencers as building blocks for heterochromatin. Bioessays 24: 828–835. 15. Tiwari VK, McGarvey KM, Licchesi JD, Ohm JE, Herman JG, et al. (2008) PcG proteins, DNA methylation, and gene repression by chromatin looping. PLoS Biol 6: e306. doi:10.1371/journal.pbio.0060306. 16. Martinez CA, Arnosti DN (2008) Spreading of a corepressor linked to action of long-range repressor hairy. Mol Cell Biol 28: 2792–2802. 17. Nibu Y, Zhang H, Levine M (2001) Local action of long-range repressors in the Drosophila embryo. EMBO J 20: 2246–2253. 18. Talbert PB, Henikoff S (2006) Spreading of silent chromatin: inaction at a distance. Nat Rev Genet 7: 793–803. 19. Rossi FM, Kringstein AM, Spicher A, Guicherit OM, Blau HM (2000) Transcriptional control: rheostat converted to on/off switch. Mol Cell 6: 723–728. 20. Halme A, Bumgarner S, Styles C, Fink GR (2004) Genetic and epigenetic regulation of the FLO gene family generates cell-surface variation in yeast. Cell 116: 405–415. 21. Domergue R, Castano I, De Las Penas A, Zupancic M, Lockatell V, et al. (2005) Nicotinic acid limitation regulates silencing of Candida adhesins during UTI. Science 308: 866–870. 22. Becskei A, Seraphin B, Serrano L (2001) Positive feedback in eukaryotic gene networks: cell differentiation by graded to binary response conversion. EMBO J 20: 2528–2535. 23. Yeh BJ, Lim WA (2007) Synthetic biology: lessons from the history of synthetic organic chemistry. Nat Chem Biol 3: 521–525. 24. Buetti-Dinh A, Ungricht R, Kelemen JZ, Shetty C, Ratna P, et al. (2009) Control and signal processing by transcriptional interference. Mol Syst Biol 5: 300.

PLoS Biology | www.plosbiology.org

25. Greber D, Fussenegger M (2007) Mammalian synthetic biology: engineering of sophisticated gene networks. J Biotechnol 130: 329–345. 26. Tan C, Marguet P, You L (2009) Emergent bistability by a growth-modulating positive feedback circuit. Nat Chem Biol 5: 842–848. 27. Adkins NL, McBryant SJ, Johnson CN, Leidy JM, Woodcock CL, et al. (2009) Role of nucleic acid binding in Sir3p-dependent interactions with chromatin fibers. Biochemistry 48: 276–288. 28. Sedighi M, Sengupta AM (2007) Epigenetic chromatin silencing: bistability and front propagation. Phys Biol 4: 246–255. 29. Biebricher A, Wende W, Escude C, Pingoud A, Desbiolles P (2009) Tracking of single quantum dot labeled EcoRV sliding along DNA manipulated by double optical tweezers. Biophys J 96: L50–52. 30. McKinney K, Mattia M, Gottifredi V, Prives C (2004) p53 linear diffusion along DNA requires its C terminus. Mol Cell 16: 413–424. 31. Bodnar M, Velazquez JJL (2005) Derivation of macroscopic equations for individual cell-based models: a formal approach. Math Methods Appl Sci 28: 1757–1779. 32. Murray JD (2007) Mathematical biology: I. An introduction. New York (New York): Springer. 555 p. 33. Buhler M, Gasser SM (2009) Silent chromatin at the middle and ends: lessons from yeasts. EMBO J 28: 2149–2161. 34. Chou CC, Li YC, Gartenberg MR (2008) Bypassing Sir2 and O-acetyl-ADPribose in transcriptional silencing. Mol Cell 31: 650–659. 35. Fourel G, Magdinier F, Gilson E (2004) Insulator dynamics and the setting of chromatin domains. Bioessays 26: 523–532. 36. King DA, Hall BE, Iwamoto MA, Win KZ, Chang JF, et al. (2006) Domain structure and protein interactions of the silent information regulator Sir3 revealed by screening a nested deletion library of protein fragments. J Biol Chem 281: 20107–20119. 37. Fourel G, Boscheron C, Revardel E, Lebrun E, Hu YF, et al. (2001) An activation-independent role of transcription factors in insulator function. EMBO Rep 2: 124–132. 38. Ratna P, Scherrer S, Fleischli C, Becskei A (2009) Synergy of repression and silencing gradients along the chromosome. J Mol Biol 387: 826–839. 39. Vasiljeva L, Kim M, Terzi N, Soares LM, Buratowski S (2008) Transcription termination and RNA degradation contribute to silencing of RNA polymerase II transcription within heterochromatin. Mol Cell 29: 313–323. 40. Irlbacher H, Franke J, Manke T, Vingron M, Ehrenhofer-Murray AE (2005) Control of replication initiation and heterochromatin formation in Saccharomyces cerevisiae by a regulator of meiotic gene expression. Genes Dev 19: 1811–1822. 41. Xie J, Pierce M, Gailus-Durner V, Wagner M, Winter E, et al. (1999) Sum1 and Hst1 repress middle sporulation-specific gene expression during mitosis in Saccharomyces cerevisiae. EMBO J 18: 6448–6454. 42. Boscheron C, Maillet L, Marcand S, Tsai-Pflugfelder M, Gasser SM, et al. (1996) Cooperation at a distance between silencers and proto-silencers at the yeast HML locus. EMBO J 15: 2184–2195. 43. Klar AJ, Kakar SN, Ivy JM, Hicks JB, Livi GP, et al. (1985) SUM1, an apparent positive regulator of the cryptic mating-type loci in Saccharomyces cerevisiae. Genetics 111: 745–758. 44. Yu Q, Elizondo S, Bi X (2006) Structural analyses of Sum1-1p-dependent transcriptionally silent chromatin in Saccharomyces cerevisiae. J Mol Biol 356: 1082–1092. 45. Bolouri H (2008) Embryonic pattern formation without morphogens. Bioessays 30: 412–417. 46. Halfon MS (2006) (Re)modeling the transcriptional enhancer. Nat Genet 38: 1102–1103. 47. Zhang Y, Lin N, Carroll PM, Chan G, Guan B, et al. (2008) Epigenetic blocking of an enhancer region controls irradiation-induced proapoptotic gene expression in Drosophila embryos. Dev Cell 14: 481–493. 48. Schwartz YB, Pirrotta V (2008) Polycomb complexes and epigenetic states. Curr Opin Cell Biol 20: 266–273.

13

March 2010 | Volume 8 | Issue 3 | e1000332


Chromosomal Epigenetic Regulatory Circuits

55. Kagansky A, Folco HD, Almeida R, Pidoux AL, Boukaba A, et al. (2009) Synthetic heterochromatin bypasses RNAi and centromeric repeats to establish functional centromeres. Science 324: 1716–1719. 56. Buhler M, Verdel A, Moazed D (2006) Tethering RITS to a nascent transcript initiates RNAi- and heterochromatin-dependent gene silencing. Cell 125: 873–886. 57. Bruggeman FJ, Oancea I, van Driel R (2008) Exploring the behavior of small eukaryotic gene networks. J Theor Biol 252: 482–487. 58. Benecke A (2006) Chromatin code, local non-equilibrium dynamics, and the emergence of transcription regulatory programs. Eur Phys J E Soft Matter 19: 353–366. 59. Hnisz D, Schwarzmuller T, Kuchler K (2009) Transcriptional loops meet chromatin: a dual-layer network controls white-opaque switching in Candida albicans. Mol Microbiol 74: 1–15. 60. Shimazaki H, Shinomoto S (2007) A method for selecting the bin size of a time histogram. Neural Comput 19: 1503–1527.

49. Choi JK, Hwang S, Kim YJ (2008) Stochastic and regulatory role of chromatin silencing in genomic response to environmental changes. PLoS ONE 3: e3002. doi:10.1371/journal.pone.0003002. 50. Xu EY, Zawadzki KA, Broach JR (2006) Single-cell observations reveal intermediate transcriptional silencing states. Mol Cell 23: 219–229. 51. Rando OJ, Paulsson J (2006) Noisy silencing of chromatin. Dev Cell 11: 134–136. 52. Saveliev A, Everett C, Sharpe T, Webster Z, Festenstein R (2003) DNA triplet repeats mediate heterochromatin-protein-1-sensitive variegated gene silencing. Nature 422: 909–913. 53. Raj A, Peskin CS, Tranchina D, Vargas DY, Tyagi S (2006) Stochastic mRNA synthesis in mammalian cells. PLoS Biol 4: e309. doi:10.1371/ journal.pbio.0040309. 54. Yin S, Wang P, Deng W, Zheng H, Hu L, et al. (2009) Dosage compensation on the active X chromosome minimizes transcriptional noise of X-linked genes in mammals. Genome Biol 10: R74.

PLoS Biology | www.plosbiology.org

14

March 2010 | Volume 8 | Issue 3 | e1000332


Widespread Gene Conversion in Centromere Cores Jinghua Shi1, Sarah E. Wolf1,2, John M. Burke1, Gernot G. Presting3, Jeffrey Ross-Ibarra4, R. Kelly Dawe1,2* 1 Department of Plant Biology, University of Georgia, Athens, Georgia, United States of America, 2 Department of Genetics, University of Georgia, Athens, Georgia, United States of America, 3 Molecular Biosciences and Bioengineering, University of Hawaii, Honolulu, Hawaii, United States of America, 4 Department of Plant Sciences, University of California, Davis, California, United States of America

Abstract Centromeres are the most dynamic regions of the genome, yet they are typified by little or no crossing over, making it difficult to explain the origin of this diversity. To address this question, we developed a novel CENH3 ChIP display method that maps kinetochore footprints over transposon-rich areas of centromere cores. A high level of polymorphism made it possible to map a total of 238 within-centromere markers using maize recombinant inbred lines. Over half of the markers were shown to interact directly with kinetochores (CENH3) by chromatin immunoprecipitation. Although classical crossing over is fully suppressed across CENH3 domains, two gene conversion events (i.e., non-crossover marker exchanges) were identified in a mapping population. A population genetic analysis of 53 diverse inbreds suggests that historical gene conversion is widespread in maize centromeres, occurring at a rate .161025/marker/generation. We conclude that gene conversion accelerates centromere evolution by facilitating sequence exchange among chromosomes. Citation: Shi J, Wolf SE, Burke JM, Presting GG, Ross-Ibarra J, et al. (2010) Widespread Gene Conversion in Centromere Cores. PLoS Biol 8(3): e1000327. doi:10.1371/journal.pbio.1000327 Academic Editor: Harmit S. Malik, Fred Hutchinson Cancer Research Center, United States of America Received October 5, 2009; Accepted February 3, 2010; Published March 9, 2010 Copyright: Ă&#x; 2010 Shi et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: This study was supported by grants from the National Science Foundation (0421671, 0421619, 0607123). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. Abbreviations: CENH3, Centromeric Histone H3; ChIP, chromatin immunoprecipitation; FISH, fluorescent in situ hybridization; IDPs, insertion-deletion polymorphisms; LD, linkage disequilibrium * E-mail: kelly@plantbio.uga.edu

[12,13,14]. Nevertheless it is not accurate to presume that centromeres never experience genetic exchange. Empirical studies have revealed evidence for recombination between sister centromeres [15,16], gene conversion events have been inferred from sequence analysis of mammalian centromeres [17,18,19], and large intrachromosomal rearrangements have been documented in rice centromeres [20,21]. However, despite the extensive circumstantial evidence for genetic exchange among centromeres, the frequency and nature of the recombination has been difficult to measure. Maize centromeres contain a 156 bp tandem repeat known as CentC and an abundant class of Ty3/Gypsy-like transposons [22]. Several subfamilies of these so-called Centromeric Retroelements (CR elements, known as CRM in maize; [23]) exist, with CRM2 being the most abundant in the maize genome [24]. Over time, CR elements insert in and around each other resulting in a nested arrangement [25,26]. Such insertion sites have a high probability of being unique and are generally polymorphic among lines, thereby providing an excellent tool for the genetic analysis of centromeres [27,28]. Here we used transposon display [29] of CRM2 to generate centromere-specific markers in maize. Analysis of segregation in a mapping population, combined with CENH3 ChIP, allowed us to map the functional region of each maize centromere and provide direct evidence for conversion-type genetic exchanges within centromere cores. An analysis of haplotype variation and linkage disequilibrium in a broad panel of maize lines revealed further evidence for a high rate of gene conversion across all centromeres studied, consistent with an important role for stochastic processes in centromere evolution.

Introduction In spite of their highly conserved function as the site of kinetochore assembly and spindle attachment, centromeres are the most dynamic regions of complex genomes. The components, copy number, and structural organization of centromeric DNA are highly divergent even among closely related species [1,2,3]. This apparent conflict between essentiality and sequence dispensability remains one of the major unresolved paradoxes in genetics. It has been hypothesized that the rapid evolution of centromeric DNA is primarily the result of an arms race in which meiotic drive sweeps novel centromeric repeats to fixation while centromeric proteins adapt to suppress this behavior [4]. Alternatively, some authors have argued that the role of selection is minimal and that observed variation can be explained by stochastic events such as mutation and genetic exchange [5,6,7]. Both proposals lack strong empirical support, as centromere drive has only rarely been documented [8], and mutational events are difficult to document in complex repetitive areas. Centromeres are specified epigenetically by the presence of a centromere-specific histone H3 variant, CENH3, which organizes the overlying kinetochores [4]. Kinetochores affect the function and behavior of centromeric DNA in pronounced ways. Perhaps most notable is their effect on crossing over. Cytogeneticists have long known that centromeres severely repress meiotic crossing over [9], and this result has since been confirmed in all species studied [10,11,12]. As a consequence, centromeres are often defined as regions where the frequency of crossovers approaches zero PLoS Biology | www.plosbiology.org

1

March 2010 | Volume 8 | Issue 3 | e1000327


Centromere Evolution

proportion of these and exhibit very low transposition rates as judged by the small proportion of elements with insertion times in the past 75,000 years [30]. CRM2 thus has the features of an excellent genetic marker, being conserved enough to easily identify while still providing substantial polymorphism. Transposon display (known as TD; see [29]) makes it possible to capture such transposon-induced polymorphisms. By pairing a transposonspecific primer with a restriction site adapter, presence or absence of a particular insertion can be scored by resolving PCR products on a polyacrylamide gel. When we used TD to display all the CRM2 elements in the maize, we found that the number of products exceeded the resolution of our gel assays. To make the results manageable, we therefore added three selective bases to the adapter primer such that only 1/64 of the total number of bands was amplified in any given experiment. The resulting data suggest that 80.3% of the CRM2 bands are polymorphic between B73 and Mo17 (74 of 376 observed bands did not segregate). To map CRM2 polymorphisms within centromeric regions, we scored a total of 257 CRM2 markers in 93 recombinant inbred lines from the maize IBM mapping population [31]. Of these, 238 mapped to 10 positions, each corresponding to a different maize centromere. The remaining 19 mapped at least one centimorgan outside of a centromere cluster and were classified as pericentromeric. The final data set revealed that the distribution of CRM2 markers is non-uniform among centromeres: there are 30 independent CRM2 markers on B73 centromere 2, for example, but only one marker on centromere 9. This result might be expected, as prior evidence has suggested repeat variation among maize centromeres [32]. An analysis of a B73/Mo17 hybrid line by fluorescent in situ hybridization (FISH) supports the interpretation that there is a rough correspondence between the number of markers recovered by CRM2 display and the intensity of CRM2 hybridization signal (Figure 1). Recombinant inbred lines should be homozygous for markers from only one parent at the vast majority of loci. However, we also detected lines that contained markers characteristic of both (27

Author Summary Centromeres, which harbor the attachment points for microtubules during cell division, are characterized by repetitive DNA, paucity of genes, and almost complete suppression of crossing over. The repetitive DNA within centromeres appears to evolve much faster than would be expected for genetically inert regions, however. Current explanations for this rapid evolution tend to be theoretical. On the one hand there are arguments that subtle forms of selection on selfish repeat sequences can explain the rapid rate of change, while on the other hand it seems plausible that some form of accelerated neutral evolution is occurring. Here, we address this question in maize, which is known for its excellent genetic mapping resources. We first developed a method for identifying hundreds of single copy markers in centromeres and confirmed that they lie within functional domains by using a chromatin immunoprecipitation assay for kinetochore protein CENH3. All markers were mapped in relation to each other. The data show that, whereas classical crossing over is suppressed, there is extensive genetic exchange in the form of gene conversion (by which short segments of one chromosome are copied onto the other). These results were confirmed by demonstrating that similar short exchange tracts are common among the centromeres from multiple diverse inbred lines of maize. Our study suggests that centromere diversity can be at least partially attributed to a high rate of previously ‘‘hidden’’ genetic exchange within the core kinetochore domains.

Results Generating Unique Centromeric Markers Using CRM2Display Maize centromeres contain hundreds of retrotransposons of the CRM family, with clearly orthologous subfamilies present in rice [30]. Elements of the CRM2 subfamily account for a large

Figure 1. Correspondence between CRM2 marker number and CRM2 FISH intensity. Metaphase chromosomes from a B73/Mo17 hybrid line (from a single cell). CRM2 LTR and telomeres are shown in green, CentC and the knob 180 bp repeat are shown in red, and chromosomes are shown in blue. The lower panel shows CRM2 FISH signal (in white), and beneath each centromeric region is the total number of CRM2 TD markers recovered from that centromere. doi:10.1371/journal.pbio.1000327.g001

PLoS Biology | www.plosbiology.org

2

March 2010 | Volume 8 | Issue 3 | e1000327


Centromere Evolution

centromeres) or neither of the parental centromeres (6 centromeres). The former could be the result of residual heterozygosity, whereas the latter was presumed to represent contamination during the propagation of the lines. A combination of flanking centromeric markers and FISH (Figure S1) allowed us to confirm these expectations and remove the heterozygous and/or contaminant centromeres from consideration (Table S1). Overall centromeric heterozygosity was 2.15%, in line with expectations (2.5%) from a 66 self-crossed population.

events (Figure S2). It is also possible (though much less likely) that these events represent exchange between non-homologous centromeres. Although we have not demonstrated that the observed marker exchanges are mechanistically gene conversion in the strictest sense, we will refer to them as conversion events throughout. Based on these observations, we can estimate that the IBM lines sustained a centromeric gene conversion rate of 1.8661024 conversion events per marker per generation (see Materials and Methods).

CRM2 Markers Interact with CENH3

Linkage Disequilibrium (LD) in Maize Centromeres

CENH3 chromatin is not continuously distributed over centromeric domains, and any assay of common centromere repeats will thus provide only a partial view of the functional centromere/kinetochore regions. To identify CRM2 markers that lie within functional regions, we added a chromatin immunoprecipitation (ChIP) step to the protocol (Figure 2). Centromeric chromatin was precipitated with anti-CENH3 antibodies, the DNA purified from its associated chromatin, and the sample further processed for CRM2 display. Of 212 markers scored by ChIP, 122 were precipitated with CENH3 (57.5%), 40 were not precipitated with CENH3, and 50 gave inconsistent results among replicates. As expected, none of the 19 known pericentromeric bands was immunoprecipitated by CENH3 antibodies. These results are consistent with prior work showing that roughly 30% of maize CRM sequences can be immunoprecipitated by CENH3 antisera [23] and that a visible proportion of the CRM elements in maize are not associated with CENH3 [33].

Direct observation of marker exchange in our mapping population confirms the existence of conversion events, but population genetic data are required to assess the historical impact that such processes may have had on maize centromeres. To this end, we genotyped a set of CRM2 TD markers in a panel of 53 inbred lines, including a 50-line core set representative of a broad base of maize genetic diversity [34]. Each line was genotyped with 75 markers derived from 10 centromeres (B73 centromeres 1, 2, 3, 5, 6, 8, and Mo17 centromeres 4, 7, 8, and 9; Figure 4). When scoring CRM2 markers in diverse inbreds, there is a possibility that unrelated bands might co-migrate with the B73- or Mo17-derived bands and thus be scored as false positives. To investigate this possibility, we confirmed all bands for a set of 12 sequenced markers on centromere 2 [24] using a second round of genotyping using 4 bp selective base primers. The data revealed that 98.2% of the genotypes (556 of 566) from centromere 2 had been scored correctly. The remaining data are reported as originally called with 3 bp primers and interpreted with an assumed false positive rate of 1.8% (Figure 4). Because all of the assayed lines are inbred, it is reasonable to interpret our multi-locus genotypes as haplotypes for population genetic analysis, even though the markers are genetically dominant. Initial investigation of average pairwise LD among markers, as measured by the ZnS statistic [35], revealed that observed haplotype configurations at 7 of the 9 centromeres cannot be explained by a model lacking historic genetic exchange (Table 1). To further test for evidence of genetic exchange, we applied the four-gamete test [36] to estimate the minimum number of genetic exchanges (Rmin) required to explain the observed data (assuming no recurrent mutation). As shown in Table 1, all nine centromeres were estimated to have nonzero Rmin (mean = 5.6), providing strong evidence for some form of genetic exchange. These Rmin values, moreover, are likely underestimates of the actual number of exchanges that have occurred at each centromere, as our markers cover only a small region of each centromere and Rmin is an inherently conservative statistic [36]. Genetic exchanges such as those measured by Rmin can be caused by either crossing over or gene conversion. These two types of exchange result in different predictions about the relationship between LD and physical distance. Crossing over produces a negative correlation between LD and distance. For instance, LD on maize chromosome arms decays to negligible levels within 2 kb [37]. In contrast, because gene conversion tracts are usually short [38] and do not affect flanking markers, gene conversion is not expected to produce a relationship between marker distance and linkage. We measured the relationship between LD and distance on centromere 2 (Figure 5), which has been fully sequenced [24]. Pairwise LD estimates reveal a block of high LD involving 3 markers spanning the only region of CentC repeats on this centromere ([24]; marked as a box on Figure 5B), but the data reveal no evidence for a correlation between LD and distance (Pearson’s correlation coefficient of 0.11 does not differ from randomly permuted datasets; p = 0.32). This pattern differs

Sequence Conversion Events within Centromeres The IBM population presents a unique opportunity for identifying rare genetic exchanges within centromere cores. Since crossing over is suppressed in centromeres, the markers from a single centromere haplotype should always be inherited as a unit. While this is true for the great majority of centromeres, we also detected aberrant inheritance patterns. These fell into two categories: loss of a marker from a known centromere haplotype and gain or transfer of a marker from one haplotype to another (Figure 3). Marker loss is a negative result and difficult to confirm; such events may in principle represent deletions but could potentially represent technical errors and were thus not pursued further. In contrast, there are several definitive ways to confirm the gain of a marker in our scoring system, and we focused further analyses on these markers. There were four cases of marker gain, each potentially representing a genetic exchange event. We first cloned and sequenced each affected band from its parental line. We then performed a new round of TD using sequence-specific primers. In two such cases, the originally scored gained bands were not observed using the sequence-specific primers, indicating that the bands likely represent new polymorphisms that happened to comigrate with one of the mapped markers. Two other bands— B73_8_ACC165 and Mo17_5_TCG264—were confirmed by sequence to represent the parental markers. At least one of these markers (B73_8_ACC165) lies within the functional CENH3 core as assayed by ChIP display. The second marker (Mo17_ 5_TCG264) did not precipitate with CENH3 antisera in our hands, though we note that a negative result by ChIP does not necessarily imply that the marker is not centromeric. An analysis of flanking markers revealed that no crossing over was associated with either B73_8_ACC165 or Mo17_5_TCG264, ruling out the possibility that they represent crossing over at the edge of the affected centromeres and indicating that they represent gene conversion, double crossover, or similar sequence exchange PLoS Biology | www.plosbiology.org

3

March 2010 | Volume 8 | Issue 3 | e1000327


Centromere Evolution

inconsequential. These results are thus inconsistent with the observed genetic exchange being the result of canonical crossing over. We therefore proceeded to estimate the rate of gene conversion on each centromere using two independent methods (Table 1). The first is based on the premise that gene conversion will increase the number of multilocus haplotypes in a sample. Coalescent simulations (see Materials and Methods; Figure 6) were used to estimate the gene conversion rate required to achieve the observed number of haplotypes. The resulting data suggest a mean estimate of 3.761025 conversion events per marker per generation and allow us to statistically reject a model with no gene conversion for all nine centromeres at p,0.05. Second, we used a composite likelihood method [39] to directly estimate gene conversion rates for each centromere. This second approach reveals similar rates of conversion across all nine centromeres, averaging ,161025 conversion events per marker per generation.

Discussion Our data indicate that gene conversion is common within centromeres and may play a fundamental role in determining the dynamics and distribution of centromere repeats. This conclusion is based on three primary lines of evidence. First, our mapping data provide what is to our knowledge the only experimental evidence for centromeric gene conversion. Indeed, two independent conversion events were identified in 93 recombinant inbred lines using a set of 238 CRM2 markers, corresponding to a rate of 1.8661024 exchanges per marker per generation. The second line of evidence comes from LD analysis of 75 markers typed in a set of 53 diverse inbred lines. These data show patterns consistent with genetic exchange, including unusually low LD and the clear presence of recombinant haplotypes (nonzero Rmin), but show no decay of LD with distance as would be expected in the presence of crossing over. Finally, two independent population genetic methods were used to directly estimate centromeric gene conversion, resulting in remarkably similar rates of ,161025 conversions per marker per generation. It is too early to tell how rates of gene conversion in centromeres compare to other regions of the maize genome, but one estimate of gene conversion at the maize anthocyaninless1 locus (,361025/marker/generation [40]) suggests they may be of a similar order of magnitude. It has been hypothesized that centromere evolution in eukaryotes with asymmetric meiosis has been primarily governed by an arms race in which meiotic drive occasionally sweeps novel centromeric repeats to fixation [4]. While the extreme LD observed around a short tract of CentC on centromere 2 may hint at an evolutionary history consistent with these ideas (Figure 5B), our finding of widespread gene conversion explains how high levels of diversity may be observed even in yeast where meiotic drive is a less likely explanation [7]. Sequence data from mammalian centromeres are further consistent with this view, suggesting in several studies that gene conversion has contributed to extant centromere variation and the production of novel higher order repeat arrays [17,18,19]. If centromeric gene conversion is indeed common in maize, yeast, and humans, it seems reasonable to hypothesize that gene conversion is an important process within the centromere cores of all eukaryotes.

Figure 2. ChIP display. The image shows CRM2 elements labeled with P33 on a polyacrylamide gel. The left panel shows results from chromatin immunoprecipitation with controls: pos, B73 nuclei used for the ChIP experiment; sup, supernatant that did not bind to CENH3 antibodies; C, CENH3-bound markers; neg, no antibody control (shows non-specific binding to the sepharose beads used for precipitation). The right panel shows an annotated comparison between sup and C lanes. The chromosomal locations of the bands precipitated are indicated. The dashes next to the S lane denote non-precipitated bands. doi:10.1371/journal.pbio.1000327.g002

Materials and Methods

dramatically from what has been observed in the rest of the genome (Figure 5, inset) [37]. Moreover, forcing the data to fit a model of nonlinear decay [37] results in an estimate of crossing over of 3.94610212 per bp per generation—so low as to be PLoS Biology | www.plosbiology.org

Genetic Stocks A ninety-four line IBM DNA Kit, provided by the Maize Genetics Cooperation Stock Center (http://www.maizemap.org/94_ibm. 4

March 2010 | Volume 8 | Issue 3 | e1000327


Centromere Evolution

Figure 3. The B73_8_ACC165 gene conversion event. This figure illustrates marker gain as primary data; see also Figure S2 for a visualization of how the data are interpreted. Panels show gel images acquired using fluorescent (FAM) labeling and capillary electrophoresis (images produced by GeneMarker software). IBM10 contains all Mo17 markers from centromere 8 as well as the centromere 8 B73_8_ACC165 marker (B73 markers are labeled in blue and Mo17 markers are labeled in red). IBM11 and IBM12 contain normal Mo17 and B73 centromeres, respectively. Only a subset of the (total 30) markers for centromere 8 is shown; see Figure S2 for the complete list. doi:10.1371/journal.pbio.1000327.g003

htm), was used for CRM2 display. IBM3 was excluded from the analysis because seven centromeres were heterozygous. Additional accessions of IBM lines used for confirmation and further ChIP and FISH analysis were obtained from the Maize Genetics COOP stock center (http://www.maizegdb.org/stock.php). A set of 53 maize inbred lines, including the majority of a 50line core set [34] with additional lines within NAM (nested association mapping) founder lines [41], were chosen to represent the genetic diversity for LD analysis. The inbreds assayed were B73, Mo17, A441, A632, B37, B57, B96, B97, C103, CI.7, CML5, CML52, CML61, CML69, CML77, CML103, CML220, CML228, CML247, CML254, CML261, CML277, CML311, CML321, CML322, CML328, CML333, F2, Hi27, HP301, I137TN, IDS28, IL14H, K55, Ki3, Ki11, KY21, M37w, Mo18w, Ms71, Nc304, Nc360, Nc348, Nc358, Oh7B, Oh43, Os420, P39, Tx303, Tzi8, Tzi9, Va85, and W401. All were obtained from the North Central Regional Plant Introduction Station, in Ames, Iowa. DNA was extracted from 3-wk-old seedlings using a modified CTAB protocol [42].

Genetically Mapping CRM2 Markers Mapping data were initially sent to a community IBM mapping service (CIMDE), which constructed a linkage map using a twopoint mapping method from a framework of 580 loci. After obtaining rough positions, we constructed a finer centromere map for each chromosome using MapMaker Version 3.0 [43]. In each centromere map, mapping scores for 20 flanking markers from the IBM2 2008 Neighbors linkage interpretation (www.maizegdb.org) were added to the file containing CRM2 markers scores. The closest IBM2 core bin markers were added as the first and last marker for each centromere map. In addition, we included as many ‘‘skeleton’’ markers (ISU map4, [13]) as possible. The CRM2 markers were then placed into the centromere framework using a multi-point method (the ‘‘try’’ MapMaker command).

Identifying CENH3-Associated Markers by ChIP Display Native ChIP was carried out as described previously [44] with minor modifications. Chromatin was extracted from young leaves (,8–15 cm) or young roots (,1 wk after germination). RNase-free DNase I (Promega, Madison, WI, USA) was utilized for chromatin digestion. Chromatin was digested to ,300–3,000 bp fragments as judged by agarose electrophoresis. After immunoprecipitation with anti-CENH3 antisera [23], the supernatant (unbound) and IP (bound) fractions were purified with a PCR purification kit (Invitrogen, Carlsbad, CA, USA) and used for CRM2 transposon display. Input DNA (before adding antibodies) was used as a positive control and a treatment without antibodies (No IgG) was used as a negative control (Figure 2). ChIP display was replicated three times for both B73 and Mo17; bands that were amplified in the IPed DNAs from all three experiments were considered to be associated with centromere cores.

CRM2 Transposon Display Transposon display was carried out as described elsewhere [24,29]. In this method, DNA is digested with BfaI and the samples PCR-amplified using CRM2 primers and adapter primers designed to anneal to the cleaved BfaI site. The method involves primary and selective amplification steps with different (nested) CRM2 primers being used in each step. The primers for primary amplification were CRM2_R1 (59GAGGTGGTGTATCGGTTGCT) and BfaI + 0 (59- GACGATGAGTCCTGAGTAG), and for selective amplification were P33 or FAMlabeled CRM2_R2 (59-CTACAGCCTTCCAAAGACGC) and BfaI + 3 selective bases (where different bases were added to the Bfa + 0 primer). A 58uC annealing temperature was used for the selective amplification. P33-labeled PCR products were separated on 6% polyacrylamide gels and FAM-labeled PCR products were separated by capillary electrophoresis and interpreted using GeneMarker software (SoftGenetics, LLC). PLoS Biology | www.plosbiology.org

Recovery and Sequencing of CRM2 Markers Sixty-four CRM2 bands were excised from TD gels and reamplified with primer set BfaI+0 and CRM2_R2. The PCR products were purified using QIAGEN (Valencia, CA) Gel Purification kit and were either directly sequenced or cloned into 5

March 2010 | Volume 8 | Issue 3 | e1000327


Centromere Evolution

Figure 4. CRM2 marker data from a set of diverse inbreds. Panels A and B together represent the entire data set. Columns show the 53 inbreds scored, while rows show the presence (black) or absence (white) of 75 CRM2 TD markers for the indicated centromeres. The columns containing B73 and Mo17 reference data are highlighted in grey. For centromere 2, only sequence-confirmed data are shown, whereas all other data were interpreted with a presumed false positive rate of 1.8%. doi:10.1371/journal.pbio.1000327.g004

PLoS Biology | www.plosbiology.org

6

March 2010 | Volume 8 | Issue 3 | e1000327


Centromere Evolution

Table 1. Linkage disequilibrium and gene conversion rates.

Centromere

Markers Rmin

ZnS

N

Gene Conversion Rate1

1

5

2

0.586

7

1.04

0.35

2

14

8

0.386**

13

1.04

0.91

3

10

8

0.326**

24

5.09

1.40

4

7

5

0.379**

14

4.09

1.42

5

5

4

0.487

10

0.461

0.90

6

5

4

0.320**

12

8.48

0.85

7

6

4

0.282**

12

8.18

1.36

8_B73

13

7

0.445*

19

2.12

0.36

8_Mo17

6

5

0.325**

14

3.64

0.91

19

13

0.249**

33

3.64

0.90

4

2

0.312**

6

1.62

0.93

Simulation Likelihood

8 9

2

N = number of haplotypes. 1 Rates presented as conversions per 105 markers. 2 All centromere 8 data combined. *p,0.05, ** p,0.001. doi:10.1371/journal.pbio.1000327.t001

a TOPO TA vector (Invitrogen, Carlsbad, CA) and then sequenced. As controls for the ChIP display method, 31 bands were cloned from both genomic DNA and ChIP display (IP) lanes, and the resulting sequences were found to be identical. All sequenced markers are available in GenBank as accessions GF099546–GF099610. Markers that were shown to interact with CENH3 are annotated with the statement ‘‘this sequence interacts with Centromeric Histone H3 (CENH3) and is within the functional centromere core.’’ We note that a subset of the sequenced markers was also used to construct the physical map of centromeres 2 and 5 [24].

Identifying and Confirming Heterozygous Centromeres in IBM Lines Heterozygous centromeres were first identified as cases where markers from both parents were present for a single centromere. A total of 27 such examples were identified. Seven heterozygous centromeres were found in a single line (IBM3) that was subsequently removed as a recent outcross contaminant. We made an effort to confirm as many of the remaining 20 heterozygous centromeres as possible using codominant insertion-deletion polymorphisms (IDPs; [13]) to confirm heterozygosity at closely linked flanking markers (16 centromeres) or by FISH of CentC content (one centromere, Figure S1). We were also able to eliminate as contaminants six centromeres that lacked markers from either parent and were together responsible for all of the nonparental bands observed on TD gels. Although they lacked B73 or Mo17 markers, four of the contaminant centromeres were shown to contain abundant CentC and CRM and one line segregated for knobs not present in either parent (Figure S1). The IDPs scored were IDP3936, IDP592, and IDP825 (chromosome 2); IDP3945 and IDP1433 (chromosome 3); IDP642, IDP476, and IDP625 (chromosome 4); IDP1408, IDP359, and IDP1607 (chromosome 5); IDP3788, IDP3799, IDP2581, IDP680, and IDP3887 (chromosome 6); IDP3795, IDP3810, and IDP3994 (chromosome 7); IDP334, IDP327, IDP811, and IDP88 (chromosome 8); and IDP4151, IDP8457, and IDP4017 (chromosome 9). PLoS Biology | www.plosbiology.org

Figure 5. Linkage disequilibrium in centromere 2. (A) Pairwise LD plotted against distance, fit to a decay function [37,49] using a value of r = 8.8161027. Inset shows the decay over the first 5 kb (in black) and the same function fit using the genome-wide median of r (in grey) [51]. (B) Heatmap of pairwise LD. Lighter colors show higher LD. The black box demarcates three markers that show high LD and flank the only cluster of CentC repeats on this centromere (the 180 kb region between positions 89.88 and 90.06 Mb on the physical map). doi:10.1371/journal.pbio.1000327.g005

Confirming Gene Conversion Events Two gene conversion events identified by B73_8_ACC165 and Mo17_5_TCG264 were confirmed in several experiments using different DNA samples and primers. The most definitive experiment for marker B73_8_ACC165 involved a highly specific primer with 11 selective bp. With this primer, the segregation was identical to the original observation, such that RIL IBM10, which contains the complete Mo17 centromere 8 haplotype, also 7

March 2010 | Volume 8 | Issue 3 | e1000327


Centromere Evolution

FISH FISH on mitotic cells was performed as described previously [32]. The following four repetitive DNA sequences were included in the probe cocktail: subtelomeric 4-12-1 (FITC labeled), CRM2 LTR (FITC labeled), CentC (Texas Red labeled), and knob180 (Texas Red labeled). The clones of 4-12-1, CentC, and knob180 were generously provided by Dr. James Birchler (University of Missouri). The CRM2 LTR was PCR amplified from genomic DNA using the following primer set: forward, 59-TCGTCAACTCAACCATCAGGT, and reverse, 59-GCAAGTAGCGAGAGCTAAACTTGA. All images were captured and processed using a Zeiss Axio Imager microscope and SlideBook 4.0 software (Intelligent Imaging Innovations, Denver, CO, USA).

Estimation of Gene Conversion Rate in IBM Lines Assuming that all markers have equal likelihood of being involved in an exchange event, and taking into account the decrease in heterozygosity during the 11 generations involved in preparing the mapping population, we can estimate the rate of x , where x is the observed number of gene exchange as M G exchanges, M the total number of markers, and G the effective number of generations available for exchange. We observed two exchange events, and scored 238 markers in each of the 93 lines remaining after removing contamination. A further 696 markers were removed because of contamination or inconsistent banding patterns, such that the total number of markers was M = 21,438. In a randomly mating population, all 11 generations would provide opportunities for exchange. But as RILs are inbred, each generation possesses less heterozygosity and thus fewer opportunities to observe an exchange event. Correcting for this, the P 1=2Ăžn , and the Ă° effective number of generations is G~1z 11 n~1 total rate is 1.8661024 exchanges per marker per generation.

LD and Simulation Calculation of Rmin, pairwise r2, and ZnS utilized code from the analysis and msstats packages of the libsequence C++ library [46]. We modeled the decay of LD with distance [37] and tested the significance of the association between r‘2 and distance along centromere 2 with 1,000 pairwise permutations. The significance of the ZnS statistic for each centromere was compared to results from 1,000 coalescent simulations under a bottleneck model (similar to [47]) with no recombination. Simulations were performed in ms [48] with the command line: ms 53 1000 -t 500 -r 0 1000000 -c c 1000 -eN 0.00556 0.00544 -eN 0.00611 1.

Figure 6. Haplotype estimation of gene conversion. Shown is the expected number of haplotypes observed under varying levels of gene conversion (c) from coalescent simulations of a maize domestication bottleneck. The solid line indicates the mean number of haplotypes, and the shaded region encloses the empirical 95% confidence intervals. Horizontal dotted lines represent the number of haplotypes observed from the centromeres indicated (m is the number of markers in that centromere). The most probable gene conversion rates occur where the dotted lines intersect with the solid lines. The last panel shows the outcome if all centromere 8 data are considered together (from both B73 and Mo17, such that m = 19). doi:10.1371/journal.pbio.1000327.g006

Estimation of Gene Conversion in Diverse Inbreds We used two independent methods to estimate gene conversion rates. First, composite likelihood methods [39], as implemented in the program maxhap (http://home.uchicago.edu/,rhudson1/ source/maxhap.html), were used to estimate the population gene conversion rate c ( = 4Neg), where g is the gene conversion rate per bp per generation. We assumed a gene conversion tract length of 1 kb, a population recombination rate of r = 4Ner = 1025 per kb, where r is the recombination rate per bp per generation, and that markers were evenly spaced across the centromere. Centromere sizes were based on map estimates [24]. Physical map positions from centromere 2 were utilized to verify that assumptions of order and distance had little effect on the final rate estimation (unpublished data). Using maxhap, we calculated the likelihood of different rates across a grid of 10,000 values of c/r from 1 to 106

contains marker B73_8_ACC165 from B73 centromere 8. For marker Mo17_5_TCG264, we directly sequenced the aberrantly scored bands in the affected RILs IBM24 and IBM54. Both lines contain the complete B73 centromere 5 haplotype as well as the Mo17_5_TCG264 marker from Mo17 centromere 5. We ruled out that crossover had occurred coincidently with marker gain using our established centromere map positions [24]. For centromere 5 we used the following markers: umc40, mmp60, rz87 - Cent5 - umc1591, umc2302, and umc1060. For centromere 8 we used bnlg1834, umc1157, umc1904 - Cent8 - AY110113, gpm572b, and IDP334. Map scores for the flanking gene markers have been previously published [13,45] and were obtained from maizegdb.org. PLoS Biology | www.plosbiology.org

8

March 2010 | Volume 8 | Issue 3 | e1000327


Centromere Evolution

per kb, reporting the value of c which maximized the likelihood for each centromere. Our second estimator of gene conversion compared the number of multilocus haplotypes present in a sample of centromere markers to coalescent simulations under a demographic model of maize domestication. We simulated chromosomes nearly devoid of recombination across a grid of gene conversion rates, performing 1,000 coalescent simulations for each value investigated. Our model closely followed prior work [47] in assuming an ancestral diploid population size of 450,000 that underwent a domestication bottleneck of 2,450 individuals, starting 11,000 years ago and lasting 1000 years. Simulations were performed in MaCS [50] using the following command line: macs 53 10e6 -t 10e-3 -r 10e-6 -c c 1000 -eN 0.00556 0.00544 eN 0.00611 1-h 10e5. Custom programs built using the libsequence C++ library [46] were used to ascertain markers using a scheme mirroring our TD methods, to choose a random subset of markers for comparison to different centromeres, to incorporate a false positive error rate of 1.8% (i.e., randomly change marker absence to marker presence with a probability of 1.8%), and to count haplotypes from the resulting simulated data. In both cases, to extract the rate g from our estimates of c, we calculated the effective population size Ne from the mean genomewide nucleotide diversity in maize [51] assuming a mutation rate of 361028 [52]. To calculate conversion rates on a per marker basis, we assumed the average tract length to be 1 kb and the average CRM2 marker to be 200 bp long.

from a cross between IBM58 and B73, showing a chromosomal feature (a knob, in red) on chromosome 2 that is not present in either B73 or Mo17. CentC (faint) and the knob 180 bp repeat are shown in red, CRM2 LTR and telomeres are shown in green, and chromosomes are shown in blue. Found at: doi:10.1371/journal.pbio.1000327.s001 (1.66 MB TIF) Figure S2 A complete list of markers from centromere 8 covering the bnlg1834 to IDP334 interval and the genotypes of IBM10, 11, and 12. Map scores for the six flanking gene markers have been previously published [13,45] and were obtained from maizegdb.org. The distances in centromereflanking regions are shown in IBM cM units, which equate to roughly one fourth the size of a standard cM. The seven Mo17 within-centromere markers and 23 B73 within-centromere markers are distributed randomly and are not meant to convey actual distance or order relative to each other (all 30 markers map genetically to the same location). For each of the IBM genotypes, B73 polymorphisms are represented by the letter B and Mo17 polymorphisms are represented by the letter M. Found at: doi:10.1371/journal.pbio.1000327.s002 (0.55 MB TIF) Table S1 Heterozygosity, contamination, and gene conversion in IBM lines. 1 het = heterozygous; / = contaminant centromere; gc = gene conversion. 2 IBM3 was removed. Found at: doi:10.1371/journal.pbio.1000327.s003 (0.23 MB DOC)

Acknowledgments We thank Katrien Devos for patient guidance in mapping methodologies, and A. J. Eckert, S. Still, G. Coop, and J. van Heerwaarden for comments on an earlier version of the manuscript.

Supporting Information Figure S1 Confirmation of centromere heterozygosity and contamination by FISH. (A) A chromosome spread from IBM85, showing centromere heterozygosity at chromosome 4. Note the differing amount of red (CentC) signal on the circled chromosomes. (B) A gel image showing that IBM47 and IBM85 are heterozygous in centromere 4 flanking regions. These data show the results for the IDP476 marker. Molecular weights of the size standards (in bp) are also indicated. (C) A chromosome spread

Author Contributions The author(s) have made the following declarations about their contributions: Conceived and designed the experiments: JS RKD. Performed the experiments: JS SEW. Analyzed the data: JS JMB JRI RKD. Contributed reagents/materials/analysis tools: GGP RKD. Wrote the paper: JS JMB JRI RKD.

References 14. Yan H, Jin W, Nagaki K, Tian S, Ouyang S, et al. (2005) Transcription and histone modifications in the recombination-free region spanning a rice centromere. Plant Cell 17: 3227–3238. 15. Liebman SW, Symington LS, Petes TD (1988) Mitotic recombination within the centromere of a yeast chromosome. Science 241: 1074–1077. 16. Jaco I, Canela A, Vera E, Blasco MA (2008) Centromere mitotic recombination in mammalian cells. J Cell Biol 181: 885–892. 17. Schindelhauer D, Schwarz T (2002) Evidence for a fast, intrachromosomal conversion mechanism from mapping of nucleotide variants within a homogeneous alpha-satellite DNA array. Genome Res 12: 1815–1826. 18. Roizes G (2006) Human centromeric alphoid domains are periodically homogenized so that they vary substantially between homologues. Mechanism and implications for centromere functioning. Nucleic Acids Res 34: 1912–1924. 19. Pertile MD, Graham AN, Choo KH, Kalitsis P (2009) Rapid evolution of mouse Y centromere repeat DNA belies recent sequence stability. Genome Res 19: 2202–2213. 20. Ma J, Bennetzen JL (2006) Recombination, rearrangement, reshuffling, and divergence in a centromeric region of rice. Proc Natl Acad Sci U S A 103: 383–388. 21. Ma J, Jackson SA (2006) Retrotransposon accumulation and satellite amplification mediated by segmental duplication facilitate centromere expansion in rice. Genome Res 16: 251–259. 22. Jiang J, Birchler JA, Parrott WA, Dawe RK (2003) A molecular view of plant centromeres. Trends Plant Sci 8: 570–575. 23. Zhong CX, Marshall JB, Topp C, Mroczek R, Kato A, et al. (2002) Centromeric retroelements and satellites interact with maize kinetochore protein CENH3. Plant Cell 14: 2825–2836. 24. Wolfgruber TK, Sharma A, Schneider KL, Albert PS, Koo DH, et al. (2009) Maize centromere structure and evolution: sequence analysis of centromeres 2

1. Murphy WJ, Larkin DM, Everts-van der Wind A, Bourque G, Tesler G, et al. (2005) Dynamics of mammalian chromosome evolution inferred from multispecies comparative maps. Science 309: 613–617. 2. O’Neill RJ, Eldridge MD, Metcalfe CJ (2004) Centromere dynamics and chromosome evolution in marsupials. J Hered 95: 375–381. 3. Lee HR, Zhang W, Langdon T, Jin W, Yan H, et al. (2005) Chromatin immunoprecipitation cloning reveals rapid evolutionary patterns of centromeric DNA in Oryza species. Proc Natl Acad Sci U S A 102: 11793–11798. 4. Henikoff S, Ahmad K, Malik HS (2001) The centromere paradox: stable inheritance with rapidly evolving DNA. Science 293: 1098–1102. 5. Smith GP (1976) Evolution of repeated DNA sequences by unequal crossover. Science 191: 528–535. 6. Charlesworth B, Sneglowski P, Stephan W (1994) The evolutionary dynamics of repetitive DNA in eukaryotes. Nature 371: 215–220. 7. Bensasson D, Zarowiecki M, Burt A, Koufopanou V (2008) Rapid evolution of yeast centromeres in the absence of drive. Genetics 178: 2161–2167. 8. Fishman L, Saunders A (2008) Centromere-associated female meiotic drive entails male fitness costs in monkeyflowers. Science 322: 1559–1562. 9. Beadle GW (1932) A possible influence of the spindle fibre on crossing-over in drosophila. Proc Natl Acad Sci U S A 18: 160–165. 10. Lambie EJ, Roeder GS (1986) Repression of meiotic crossing over by a centromere (CEN3) in Saccharomyces cerevisiae. Genetics 114: 769–789. 11. Mahtani MM, Willard HF (1998) Physical and genetic mapping of the human X chromosome centromere: repression of recombination. Genome Res 8: 100–110. 12. Copenhaver GP, Nickel K, Kuromori T, Benito M, Kaul S, et al. (1999) Genetic definition and sequence analysis of Arabidopsis centromeres. Science 286: 2468–2474. 13. Fu Y, Wen TJ, Ronin YI, Chen HD, Guo L, et al. (2006) Genetic dissection of intermated recombinant inbred lines using a new genetic map of maize. Genetics 174: 1671–1683.

PLoS Biology | www.plosbiology.org

9

March 2010 | Volume 8 | Issue 3 | e1000327


Centromere Evolution

25. 26.

27.

28.

29.

30.

31.

32.

33.

34.

35. 36.

37. Remington DL, Thornsberry JM, Matsuoka Y, Wilson LM, Whitt SR, et al. (2001) Structure of linkage disequilibrium and phenotypic associations in the maize genome. Proc Natl Acad Sci U S A 98: 11479–11484. 38. Jeffreys AJ, May CA (2004) Intense and highly localized gene conversion activity in human meiotic crossover hot spots. Nat Genet 36: 151–156. 39. Hudson RR (2001) Two-locus sampling distributions and their application. Genetics 159: 1805–1817. 40. Yandeau-Nelson MD, Zhou Q, Yao H, Xu X, Nikolau BJ, et al. (2005) MuDR transposase increases the frequency of meiotic crossovers in the vicinity of a Mu insertion in the maize a1 gene. Genetics 169: 917–929. 41. Yu JM, Holland JB, McMullen MD, Buckler ES (2008) Genetic design and statistical power of nested association mapping in maize. Genetics 178: 539–551. 42. Doyle JJ, Doyle JL (1987) A rapid DNA isolation procedure for small quantities of fresh leaf tissue. Phytochemistry Bulletin 19: 11–15. 43. Lander ES, Green P, Abrahamson J, Barlow A, Daly MJ, et al. (1987) MAPMAKER: an interactive computer package for constructing primary genetic linkage maps of experimental and natural populations. Genomics 1: 174–181. 44. Topp CN, Zhong CX, Dawe RK (2004) Centromere-encoded RNAs are integral components of the maize kinetochore. Proc Natl Acad Sci U S A 101: 15986–15991. 45. Sharopova N, McMullen MD, Schultz L, Schroeder S, Sanchez-Villeda H, et al. (2002) Development and mapping of SSR markers for maize. Plant Mol Biol 48: 463–481. 46. Thornton K (2003) Libsequence: a C++ class library for evolutionary genetic analysis. Bioinformatics 19: 2325–2327. 47. Wright SI, Bi IV, Schroeder SG, Yamasaki M, Doebley JF, et al. (2005) The effects of artificial selection on the maize genome. Science 308: 1310–1314. 48. Hudson RR (2002) Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics 18: 337–338. 49. Hill WG, Weir BS (1988) Variances and covariances of squared linkage disequilibria in finite populations. Theor Popul Biol 33: 54–78. 50. Chen GK, Marjoram P, Wall JD (2009) Fast and flexible simulation of DNA sequence data. Genome Res 19: 136–142. 51. Gore MA, Chia JM, Elshire RJ, Sun Q, Ersoz ES, et al. (2009) A first-generation haplotype map of maize. Science 326: 1115–1117. 52. Clark RM, Tavare S, Doebley J (2005) Estimating a nucleotide substitution rate for maize from polymorphism at a major domestication locus. Mol Biol Evol 22: 2304–2312.

and 5 reveals dynamic Loci shaped primarily by retrotransposons. PLoS Genet 5: e1000743. doi:10.1371/journal.pgen.1000743. SanMiguel P, Gaut B, Tikhonov A, Nakajima Y, Bennetzen J (1998) The paleontology of intergene retrotransposons of maize. Nat Genet 20: 43–45. Nagaki K, Song J, Stupar R, Parokonny AS, Yuan Q, et al. (2003) Molecular and cytological analyses of large tracks of centromeric DNA reveal the structure and evolutionary dynamics of maize centromeres. Genetics 163: 759–770. Devos KM, Ma J, Pontaroli AC, Pratt LH, Bennetzen JL (2005) Analysis and mapping of randomly chosen bacterial artificial chromosome clones from hexaploid bread wheat. Proc Natl Acad Sci U S A 102: 19243–19248. Luce AC, Sharma A, Mollere OS, Wolfgruber TK, Nagaki K, et al. (2006) Precise centromere mapping using a combination of repeat junction markers and chromatin immunoprecipitation-polymerase chain reaction. Genetics 174: 1057–1061. Casa AM, Brouwer C, Nagel A, Wang L, Zhang Q, et al. (2000) Inaugural article: the MITE family heartbreaker (Hbr): molecular markers in maize. Proc Natl Acad Sci U S A 97: 10083–10089. Sharma A, Presting GG (2008) Centromeric retrotransposon lineages predate the maize/rice divergence and differ in abundance and activity. Mol Genet Genomics 279: 133–147. Lee M, Sharopova N, Beavis WD, Grant D, Katt M, et al. (2002) Expanding the genetic map of maize with the intermated B736Mo17 (IBM) population. Plant Mol Biol 48: 453–461. Kato A, Lamb JC, Birchler JA (2004) Chromosome painting using repetitive DNA sequences as probes for somatic chromosome identification in maize. Proc Natl Acad Sci U S A 101: 13554–13559. Jin W, Melo JR, Nagaki K, Talbert PB, Henikoff S, et al. (2004) Maize centromeres: organization and functional adaptation in the genetic background of oat. Plant Cell 16: 571–581. Liu K, Goodman M, Muse S, Smith JS, Buckler E, et al. (2003) Genetic structure and diversity among maize inbred lines as inferred from DNA microsatellites. Genetics 165: 2117–2128. Kelly JK (1997) A test of neutrality based on interlocus associations. Genetics 146: 1197–1206. Hudson RR, Kaplan NL (1985) Statistical properties of the number of recombination events in the history of a sample of DNA sequences. Genetics 111: 147–164.

PLoS Biology | www.plosbiology.org

10

March 2010 | Volume 8 | Issue 3 | e1000327


Expression in Aneuploid Drosophila S2 Cells Yu Zhang1, John H. Malone1, Sara K. Powell2, Vipul Periwal3, Eric Spana4, David M. MacAlpine2, Brian Oliver1* 1 Laboratory of Cellular and Developmental Biology, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland, United States of America, 2 Department of Pharmacology and Cancer Biology, Duke University Medical Center, Durham, North Carolina, United States of America, 3 Laboratory of Biological Modeling, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland, United States of America, 4 Department of Biology, Duke University, Durham, North Carolina, United States of America

Abstract Extensive departures from balanced gene dose in aneuploids are highly deleterious. However, we know very little about the relationship between gene copy number and expression in aneuploid cells. We determined copy number and transcript abundance (expression) genome-wide in Drosophila S2 cells by DNA-Seq and RNA-Seq. We found that S2 cells are aneuploid for .43 Mb of the genome, primarily in the range of one to five copies, and show a male genotype (, two X chromosomes and four sets of autosomes, or 2X;4A). Both X chromosomes and autosomes showed expression dosage compensation. X chromosome expression was elevated in a fixed-fold manner regardless of actual gene dose. In engineering terms, the system ‘‘anticipates’’ the perturbation caused by X dose, rather than responding to an error caused by the perturbation. This feed-forward regulation resulted in precise dosage compensation only when X dose was half of the autosome dose. Insufficient compensation occurred at lower X chromosome dose and excessive expression occurred at higher doses. RNAi knockdown of the Male Specific Lethal complex abolished feed-forward regulation. Both autosome and X chromosome genes show Male Specific Lethal–independent compensation that fits a first order dose-response curve. Our data indicate that expression dosage compensation dampens the effect of altered DNA copy number genome-wide. For the X chromosome, compensation includes fixed and dose-dependent components. Citation: Zhang Y, Malone JH, Powell SK, Periwal V, Spana E, et al. (2010) Expression in Aneuploid Drosophila S2 Cells. PLoS Biol 8(2): e1000320. doi:10.1371/ journal.pbio.1000320 Academic Editor: Peter B. Becker, Adolf Butenandt Institute, Germany Received September 23, 2009; Accepted January 20, 2010; Published February 23, 2010 This is an open-access article distributed under the terms of the Creative Commons Public Domain declaration which stipulates that, once placed in the public domain, this work may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. Funding: This work was supported by the The National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) Intramural Research Program, National Human Genome Research Institute (NHGRI) extramural grant HG004279, and a Whitehead Foundation Scholar Award. The funders had no role in study design, data collection and analysis or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. Abbreviations: CGH, comparative genome hybridization; ChIP, chromatin immunoprecipitation; CPA, Bayesian change point analysis; dsRNA, double stranded RNA; MSL, male specific lethal; RPKM, reads per kb per million reads; RNAi, RNA interference * E-mail: oliver@helix.nih.gov

particular region or gene [5]. This indicates that the detrimental effect of aneuploidy is a collective function of multiple small effects, not a function of particular genes. Interestingly, while aneuploidy results in inviability at the organism level, aneuploid cells can out-compete diploid cells for growth in vivo or in vitro. Human cancer cells are a good example of proliferating cells characterized by aneuploidy [7]. Most tumors are nearly diploid or tetraploid with extra or lost chromosomes. Even tumors with a normal number of chromosomes contain other rearrangements that result in segmental aneuploidy. It is likely that aneuploidy results in a systems or gene interaction defect. Given that a deleterious effect of aneuploidy is likely to occur at the level of genome balance, understanding the response to aneuploidy requires the exploration of general control mechanisms that operate at the network level. We have turned to widely used Drosophila S2 tissue culture cells as an aneuploid model [8,9]. These cells are generally tetraploid [9] and studies of gene expression and X chromosome dosage compensation indicate that they are male [10]. As a natural consequence of chromosomal sex determination in Drosophila, females have two X chromosomes and two pairs of autosomes (2X;2A) and males have a single X chromosome (1X;2A) [11]. Therefore, male cells can be thought of as naturally occurring

Introduction The somatic cells of multicellular animals are almost exclusively diploid, with haploidy restricted to post-meiotic germ cells. Having two copies of every gene has an obvious advantage. Mutations arise de novo within cells of an organism and within organisms in populations, such that deleterious mutation-free haploid genomes are extremely rare. The wild type alleles of genes tend to be dominant to the recessive loss-of-function alleles, providing a degree of redundancy allowing diploid organisms to survive even with a substantial genetic load of deleterious mutations in each haplotype. While the dose of most individual genes is of little consequence to the organism, larger scale genomic imbalance, or aneuploidy, is detrimental [1–4]. Chromosomal aneuploidy occurs when whole chromosomes are lost or duplicated and segmental aneuploidy results from deletions, duplications, and unbalanced translocations. In Drosophila, a systematic genome-wide segmental aneuploidy study [5] demonstrated that of all genes (now known to be about 15,000 [6]), only about 50 are haploinsufficient and just one gene is triplo-lethal. However, these same experiments showed that large deletions and duplications result in reduced viability and fertility that depends on the extent of aneuploidy, and not on any PLoS Biology | www.plosbiology.org

1

February 2010 | Volume 8 | Issue 2 | e1000320


Expression in Aneuploid Cells

concentration at steady-state [24]. In addition to the enzymatic properties of transcription, more than a generation of molecular biologists has elegantly described extensive transcriptional regulation networks controlling key phenotypes [25]. These regulatory motifs are sensitive to changes in gene dose [26]. Feedback is an outstanding error-controlled regulator that detects deviations from the norm and implements corrective action. Feed-forward regulation differs in that it anticipates the possible effect of perturbations on the system rather than correcting the perturbation after the deviation occurs. This could operate if cells detect copy number and correct transcription levels before a quantitative error in transcript abundance is evident. In male embryos, the sex determination hierarchy detects X chromosome number and leads to association of the MSL complex with the X chromosome before zygotic transcription is activated [27], as expected for a feed-forward regulator. However, MSL is selectively bound to transcribed genes [28], which is also consistent with feedback regulation. By examining the response of X chromosome genes to dose in the presence and absence of MSL, we show that X chromosome dosage compensation results from a combination of MSL-dependent feed-forward regulation based on anticipated effects from unbalanced gene dose and a more general and dynamic response to perceived gene dose. The latter could be due to negative feedback, buffering, or both.

Author Summary While it is widely recognized that mutations in protein coding genes can have harmful consequences, one can also have too much or too little of a good thing. Except for the sex chromosomes, genes come in sets of two in diploid organisms. Extra or missing copies of genes or chromosomes result in an imbalance that can lead to cancers, miscarriages, and disease susceptibility. We have examined what happens to gene expression in Drosophila cells with the types of gross copy number changes that are typical of cancers. We have compared the response of autosomes and sex chromosomes and show that there is some compensation for copy number change in both cases. One response is universal and acts to correct copy number changes by changing transcript abundance. The other is specific to the X chromosome and acts to increase expression regardless of gene dose. Our data highlight how important gene expression balance is for cell function.

chromosomal aneuploids. The response to altered gene dose probably occurs at multiple levels, but transcription is an early step in the flow of information from the genome and is a likely site for control. For example, X chromosome dosage compensation clearly occurs at the transcriptional level [12] and is exquisitely precise [13]. The Male Specific Lethal (MSL) complex regulates the balanced expression of X chromosomes in wild type 1X;2A male flies. MSL is composed of at least four major proteins (Msl1, Msl2, Msl3, and Mof) and two non-coding RNAs (RoX1 and RoX2) [11]. Mof is an acetyltransferase responsible for acetylating H4K16 [11,14,15]. Mof is highly enriched on the male X chromosome as a component of the MSL complex. However, Mof also associates with a more limited repertoire of autosomal genes independently of MSL [16]. H4K16ac is associated with increased transcription in many systems [17]. Therefore, it is widely believed that this acetylation results in increased expression of the X chromosome [11], although an alternative hypothesis suggests that MSL sequesters Mof from the autosomes to drive down autosome expression [18]. Determining which of these mechanisms occurs is complicated by the very nature of sampling experiments when much of the transcriptome is altered. The number of X chromosome transcripts sampled from the transcriptome depends on the relative abundance of the X chromosome and autosome transcripts. The salient feature of both models is balanced X chromosome and autosome expression. While the term dosage compensation is used to describe X chromosome expression, dosage compensation is not restricted to X chromosomes in Drosophila. Autosomes also show significant, but much less precise, dosage compensation at the expression level [13,19–21], suggesting that there is a general dose response genome-wide. Despite the clear role of MSL in X chromosome dosage compensation, the control system rules for MSL function and the contribution of global compensation mechanisms to the specific case of the X chromosome are poorly understood. There are three basic transcript control mechanisms that could modify the effect of gene dose: buffering, feedback, and feedforward [22]. Here we define buffering as the passive absorption of gene dose perturbations by inherent system properties. For example, if transcription obeys mass-action kinetics and the gene/transcription complex is considered an enzyme [23], then one would not expect a one-to-one relationship between mRNA and gene copy because of the small effect of a change in enzyme PLoS Biology | www.plosbiology.org

Results Segmental Aneuploidy in S2 Cells To determine the extent of aneuploidy in S2 cells, we performed next generation sequencing (DNA-Seq) and comparative genome hybridization (CGH). These data confirmed the predicted male genotype of S2 cells, as the average sequence depth of the X chromosome (reads per kb per million reads, RPKM) was 54% of the autosome RPKM (Figures 1 and 2A). We also found that S2 cells exhibit numerous large regions of segmental aneuploidy (Figure 1, Figure S1, Table S1). Stepwise deviations from expected dose covered ,42% (,40.0 Mb) of the autosomes and ,17% (,3.8 Mb) of the X chromosome (Figure S1). The vast majority of the aneuploid segments showed an extra or lost copy. There was high congruence between DNA-Seq and CGH methods. For example, we determined that .93% of calls for copy numbers between one and five made by DNA-Seq analysis were confirmed by CGH, even when comparing different lots of cells grown under slightly different conditions (Figure S2, Table S2). These data suggest that S2 cells are highly aneuploid but show a reasonably stable genotype. There was much more variability seen when copy number was greater than five (30% agreement between methods and cultures). This could be due to failure to call short segmental duplications or to repeat expansion/ retraction in different cultures. Regardless of cause, we decided to focus our subsequent expression analyses on the high-confidence one to five copy genes (Table S3).

Genome-Wide Compensation We observed striking differences in DNA-Seq read density among chromosome arms due to segmental aneuploidy (Figure 2A, p,10215, KS test). To determine if these DNA differences are also associated with similar changes at the transcript level, we profiled transcript expression by next generation sequencing (RNA-Seq). We validated RNA-Seq data by microarray profiling and found outstanding agreement (rs = 0.87, p = 0). Expression analysis revealed striking dosage compensation. Even though copy number values significantly differed at the chromosome level (Figure 2A), we found that expression from autosome arms and the X 2

February 2010 | Volume 8 | Issue 2 | e1000320


Expression in Aneuploid Cells

Figure 1. S2 cell DNA copy number. (A–D) DNA density and copy number profiles of the X chromosome (A, B) and chromosome 2L (C, D), showing copy number of aneuploidy segments along chromosome length. The RPKM DNA-Seq density in nonoverlapping 1 kb windows was plotted against the chromosome coordinates and the final deduced copy number is indicated (color key). The copy number was determined by Bayesian change point analysis (CPA) (A, C) and CGH (B, D). The CGH results are projected onto the DNA-Seq data. The average DNA densities of each aneuploid segment between predicted breakpoints (black lines) are shown. doi:10.1371/journal.pbio.1000320.g001

PLoS Biology | www.plosbiology.org

3

February 2010 | Volume 8 | Issue 2 | e1000320


Expression in Aneuploid Cells

Figure 2. Expression at varying copy numbers. (A, B) Boxplots showing the distribution of DNA-Seq read densities (in non-overlapping 1 kb windows) mapped to chromosome arms in S2 cells (A) and the distribution of RNA-Seq expression values at the gene-level (B). Pie charts (A, B) show the distributions of copy numbers on each chromosome arm (for expressed genes only). See Figure 1 for copy number color key. The X chromosome is in red. (C, D) Boxplots showing the distribution of RNA-Seq expression values by copy number (C) and expression per copy (D). Equivalent expression medians for two copies on the X and four copies on the autosomes are indicated (dashed line). For all boxplots, the 25th to 75th percentiles (boxes), medians (lines in boxes), and ranges (whiskers, 1.5 times the interquartile range extended from both ends of the box) are shown. Asterisks indicate significant differences from all other chromosome arms (A, B) or from the 2X or 4A baseline (C). doi:10.1371/journal.pbio.1000320.g002

copy indicates genome-wide.

chromosome were similar inter se (Figure 2B). In no case was the expression of a chromosome arm significantly different from all other arms (p.1022, KS test), indicating that dosage compensation occurs genome-wide, not just on the X chromosome. To examine the precision of dosage compensation, we determined the relationship between expression and copy number. Compensation was not perfect, as expression increased with copy number (Figure 2C, p,1024, KS test). This imperfect compensation resulted in a sublinear relationship between copy number and gene expression, such that per copy expression values decreased with increased copy number on the autosomes and especially on the X chromosome (Figure 2D). This inverse relationship between copy number and expression per PLoS Biology | www.plosbiology.org

that

partial

dosage

compensation

occurs

The X Chromosome X chromosome dosage compensation was of particular interest. In wild type males, X chromosome dose (1X) is 50% of autosomal dose (2A). In S2 cells this relationship occurred at 2X;4A due to tetraploidy. The precision of X chromosome dosage compensation in S2 cells was revealed by the indistinguishable expression of two copy X chromosome genes and four copy autosome genes (Figure 2C, p = 0.15, KS test). Thus X chromosome dosage compensation shows similar efficacy in diploid 1X;2A flies and in aneuploid 2X;4A tissue culture cells. 4

February 2010 | Volume 8 | Issue 2 | e1000320


Expression in Aneuploid Cells

test). These data clearly indicate that MSL acts on expression based on X chromosome gene nature, rather than monitoring actual copy number. Drosophila X chromosomes are dosage compensated over the full range of gene expression values. Given that MSL is bound selectively to expressed genes, we also asked if there is a relationship between expression levels and dosage compensation. We determined that the RNAi treatments had the same effect on X chromosome gene expression regardless of expression levels (Figure 5E and 5F). Interestingly, these experiments also showed only a modest effect of mof on autosomal expression, suggesting that the proposed autosomal function of Mof [16] is subtle. The effect of Mof on autosomes was expression level dependent, as we observed a greater fold effect at low expression levels. However, the most overt effect of wild type Msl2 or Mof was a 1.35-fold increase in X chromosome expression at all expression values. These data indicate that MSL acts as a feed-forward multiplier causing a fixed-fold effect on X chromosome expression regardless of gene copy number and basal gene expression value.

The aneuploid S2 cells also allowed us to examine the effect of X chromosome dosage compensation when the X chromosome dose was greater or less than 50%. Precise X chromosome dosage compensation did not occur at these other gene doses (Figure 2C, p,1029, KS test). For example, when we compared expression from three copy genes on the X chromosome and autosomes, X chromosome gene expression per copy was higher despite identical copy number (Figure 2D). Thus, we suggest that X chromosome dosage compensation is error generating when the underlying X chromosome gene dose is equivalent to the autosomal gene dose. Similarly, we found under-compensated X chromosome expression when there was a single copy of an X chromosome segment. These data indicate that the anticipated or predicted X chromosome copy number that implements the sex and dosage compensation pathway determines X chromosome expression. The actual X chromosome dose is not a factor. This error generation following perturbation is a property of feed-forward regulation [22].

MSL Complex Genome-Wide Sublinear Expression Response to Gene Dose

To evaluate the effect of the MSL complex on appropriate and error generating X chromosome dosage compensation in S2 cells, we performed RNA interference (RNAi) experiments to knockdown expression of two genes encoding key MSL components, msl2 and mof. If MSL operates via feedback regulation, then knockdown should differentially alter expression depending on dose, whereas if MSL is a feed-forward regulator, the effect of MSL on expression should be X chromosome specific but dose independent. We selected double stranded RNAs (dsRNA) targeting msl2 and mof that resulted in greater than 90% knockdown at the mRNA (not shown) and protein levels (Figure 3A). MSL is a chromatinmodifying machine. We therefore also determined if RNAi altered X chromatin. The X chromosome showed high levels of acetylation at expressed genes (Figure 3B and 3C), and both msl2 and mof RNAi resulted in markedly reduced H4K16ac levels on the X chromosome as determined by chromatin immunoprecipitation on microarray (ChIP-chip, Figure 3B, 3D, and 3E). RNAi against mof also resulted in decreased autosomal H4K16ac (Figure 3B and 3E). All these data suggest that the RNAi treatments were effective. We then measured the effect of msl2 and mof RNAi on expression by RNA-Seq. As in the previous experiments, we validated expression by microarray expression profiling and found outstanding agreement (rs = 0.87–0.89, p = 0, Figure S3). We observed decreased expression of X chromosome genes following either RNAi treatment (Figure 4, p,1022, KS test), consistent with the role of MSL in promoting expression of X chromosome genes relative to autosomes. For example, in mof RNAi cells we observed a median expression of 26.4 RPKM for autosomal genes present at four copies and only 18.6 RPKM for X chromosome genes present at two copies (p,10215, KS test). The msl2 or mof RNAi treatments broke the precise equilibration of 2X with 4A expression. We observed 1.35-fold greater X chromosome expression attributable to wild type Msl2 or Mof (average RNAi/Mock expression ratio = 0.74, p,10215, KS test), with little to no effect on autosomal expression (Figure 5A and 5B). If MSL acts as a strict feed-forward regulator, then MSL would have the same fold effect on all populations of X chromosome genes at a given copy number, irrespective of the actual copy number. Indeed, we observed a similar fold effect on the expression of X chromosome genes with different copy numbers (Figure 5C and 5D, 0.58,p,0.89 in msl2 RNAi, 0.21,p,0.91 in mof RNAi, KS PLoS Biology | www.plosbiology.org

X chromosome dosage compensation is 2-fold, but we observed only a 1.35-fold effect of MSL. If MSL is the only contributor to X chromosome dosage compensation and if knockdown was complete, we would expect X chromosome and autosome genes with the same copy number to show the same expression levels following msl2 or mof RNAi treatment. However, following either msl2 or mof RNAi, three copy genes on the X chromosome were still 1.19-fold over-expressed relative to three copy genes on autosomes (Figure 6A, p,0.01, KS test). This difference between expected and observed expression could be due to residual MSL activity exclusively, or due to a combination of residual MSL activity and an MSL-independent component of X chromosome dosage compensation. The MSL-independent compensation could be the same as observed on the autosomes. Given that the fixedfold properties of MSL also apply to residual activity, then the over-expression of X chromosome genes following RNAi treatment should also have a fixed fold effect if there is residual MSL activity. We observed significantly increased variance in the expression ratios between the X chromosome and autosomes following RNAi (p,1022, F test, Figure 6B). This supports the idea that much of the unexplained X chromosome dosage compensation is not due to a fixed-fold effect on expression. It is possible that there are MSL-dose dependent effects on X chromosome expression due to variable affinity, although the fixed-fold effect of MSL knockdown on the population of genes makes this less likely. These data suggest that there is an MSL-independent component of X chromosome dosage compensation. To determine if the MSL-independent component is the same dosage compensation system that operates on autosomes, we characterized the sublinear expression response to gene dose for the X chromosome and autosomes with or without RNAi treatment. There were three distinct trend lines for the relationship between copy number and expression: one for the autosomes and one each for the X chromosome with and without RNAi treatment (Figure 6A). There are an infinite number of possible sublinear curves. If the nature of the dose response on the X chromosome differed from the autosomes, or the presence or absence of MSL, then scaling should not result in a common fit. However, if the three dose response curves are the result of a common dosage compensation mechanism, then they should scale to yield a single curve that fits all three of the absolute doseresponse curves. 5

February 2010 | Volume 8 | Issue 2 | e1000320


Expression in Aneuploid Cells

Figure 3. msl2 and mof RNAi. (A) Western analysis showing changes in MSL protein abundance following RNAi for msl2 and mof in S2 cells. (B) Kmeans clustering (k = 3) of H4K16ac ChIP/input ratio for expressed genes on the X chromosome and chromosome 3R in RNAi and mock treated S2 cells. Genes enriched (yellow) and depleted (blue) for H4K16ac are indicated. (C) Boxplots showing the distribution of H4K16ac ChIP/input ratios in mock treated cells for expressed genes on different chromosome arms. (D–E) Boxplots showing the distribution of H4K16ac ChIP ratios between msl2 RNAi cells (D) or mof RNAi cells (E) and mock treated cells for expressed genes on different chromosome arms. Significant differences (p,1022) among chromosome arms (C) and between RNAi and mock treated cells (D, E) are indicated by asterisks. doi:10.1371/journal.pbio.1000320.g003

We set median expression fold change at 2X and 4A to 1.0 for both copy number and expression (Figure 6C). We found that X chromosome and autosomes show remarkably similar fold changes in expression relative to fold changes in copy number. AdditionPLoS Biology | www.plosbiology.org

ally, the relationship between X chromosome expression and copy number is MSL independent following scaling. These data suggest that like the autosomes, the X chromosome is subject to dosage compensation based on actual gene dose. The gene dose to 6

February 2010 | Volume 8 | Issue 2 | e1000320


Expression in Aneuploid Cells

Figure 4. Expression following msl2 or mof RNAi. Boxplots showing the distribution of expression RPKM values at indicated copy number on the X chromosome (left) and autosomes (right) in RNAi and mock treated S2 cells. Equivalent expression of two copy X chromosome genes and four copy autosomal genes in mock treated cells is shown (dashed line). See Figure 2 for boxplot format. Asterisks indicate significant expression decrease in RNAi cells compared to mock treated cells. doi:10.1371/journal.pbio.1000320.g004

expression response fits a one parameter model y = x(EC50 +1)/ (EC50 + x), where y is transcript abundance, x is DNA copy number expressed as a ratio relative to wild type, and EC50 is the copy number required for half maximal expression (r2.0.99). This indicates that gene expression is a saturating function of gene dose regardless of chromosome location or the presence of MSL.

compensation may explain all of the final increase in S2 cell X chromosome expression (1.50-fold61.35-fold = 2.03-fold). While most work on dosage compensation focuses on the X chromosome [2,11], other organisms also show dosage compensation on autosomes [33]. For example, mammalian trisomies show only about a 1.3-fold increase in gene expression as a result of a 1.5-fold change in gene dose [34,35]. Compensation is likely to be a universal property of biological systems that enables cells to avoid deleterious effects of genetic load and other perturbations.

Discussion Our data indicate that the MSL complex and general compensation mechanisms independently contribute to male X chromosome dosage compensation. The MSL complex recognizes active X chromosome genes [28–31]. We have shown that MSL then acts as a simple unidirectional multiplier of expression regardless of the actual gene dose and gene expression level. In contrast, buffering and feed-back are dose sensitive and absorb the expression perturbations caused by unbalanced dose. We suggest that all these mechanisms are critical for proper X chromosome dosage compensation. Some rough accounting illustrates the composite nature of X chromosome dosage compensation. In the Drosophila genus, dosage compensation results in a 2.0- to 2.2-fold increase in X chromosome expression in males relative to autosomes [13,32]. Similarly, in S2 cells we observed a 2.08-fold increase in X chromosome expression. The fixed-fold effect of MSL resulted in at least a 1.35-fold increase in X-chromosome expression. Doseresponsive compensation also acted to increase X chromosome expression and was independent of MSL function. We can estimate the contribution of dose-responsive compensation from work performed on whole flies and on S2 cells. Autosomal dosage compensation increases per copy expression by 1.4- to 1.6-fold in diploid flies with a single copy of tens of genes [13,19]. In agreement with those reported values, we can project that a 2-fold change in scaled DNA dose in S2 cells results in about a 1.5-fold increase in scaled gene expression. Thus, at face value, the layered effect of dose-responsive compensation and feed-forward dosage PLoS Biology | www.plosbiology.org

Materials and Methods Cell Strains and Media Drosophila S2 cells [9] (a.k.a. SL2) were obtained from Drosophila RNAi Screening Center (DRSC, Harvard Medical School, Boston, MA) and were grown at 25uC in Schneider’s Drosophila Medium (Invitrogen, Carlsbad, CA) supplemented with 10% Fetal Bovine serum (SAFC Biosciences, Lenexa, KS) and PenicillinStreptomycin (Invitrogen, Carlsbad, CA). These cells were used for all experiments, except CGH, where S2-DRSC cells were obtained from the Drosophila Genomics Resource Center (#181, Bloomington, IN).

Sequencing We extracted S2 cell genomic DNA using a genomic DNA kit (Qiagen, Valencia, CA). Approximately 2 mg of purified genomic DNA was randomly fragmented to less than 1,000 bp by 30 min sonication at 4uC with cycles of 30 s pulses with 30 s intervals using the Bioruptor UCD 200 and a refrigerated circulation bath RTE-7 (Diagenode, Sparta, NJ). Sonicated chromatin (see ChIP protocol) was purified by phenol/chloroform extraction. We extracted S2 cell total RNA with Trizol (Invitrogen, Carlsbad, CA) and isolated mRNA using Oligotex poly(A) (Qiagen, Valencia, CA). The number of cells used for each extraction was counted using a haemocytometer. The quality of mRNA was examined by RNA 6000 Nano chip on a Bioanalyzer 7

February 2010 | Volume 8 | Issue 2 | e1000320


Expression in Aneuploid Cells

Figure 5. Mof and Msl2 effects on expression. (A, B) Boxplots showing the distribution of expression ratios between msl2 RNAi cells (A) or mof RNAi cells (B) and mock treated cells by chromosome arms. The expected fold decrease in X chromosome expression after RNAi treatment is indicated (red dashed line). (C, D) Boxplots showing the expression ratios for msl2 (C) and mof (D) RNAi treated cells at indicated gene copy numbers. The X chromosome (left) and autosomes (right) are shown separately. (E, F) The relation between gene expression and fold expression change in msl2 (E) and mof (F) RNAi treated cells plotted as a moving average (20 gene/window). doi:10.1371/journal.pbio.1000320.g005

PLoS Biology | www.plosbiology.org

8

February 2010 | Volume 8 | Issue 2 | e1000320


Expression in Aneuploid Cells

TG-39, and reverse, 59-taatacgactcactatagggTGCGGTCGCTGTAGTCATAG-39. For RNAi treatment, S2 cells were resuspended in serum free media at 26106 cells/ml. Twenty mg dsRNA was added to 1 ml of cell suspension and incubated for 45 min at room temperature. Cells with the same serum free media treatment but without added dsRNA were used as mock treated controls. After the incubation, 3 ml complete medium was added and the cells were cultured for another 4 d. Cells were collected and split into three aliquots for mRNA extraction, chromatin immunoprecipitation, and western analysis.

2100 (Agilent, Santa Clara, CA) according to the manufacture’s protocol. One hundred ng of the extracted mRNA was then fragmented in fragmentation buffer (Ambion, Austin, TX) at 70uC for exactly 5 min. The first strand cDNA was then synthesized by reverse transcriptase using the cleaved mRNA fragments as template and high concentration (3 mg) random hexamer Primers (Invitrogen, Carlsbad, CA). After the first strand was synthesized, second strand cDNA synthesis was performed using 50U DNA polymerase I and 2U RNaseH (Invitrogen, Carlsbad, CA) at 16uC for 2.5 h. Deep sequencing of both DNA and short cDNA fragments were performed [36,37]. Libraries were prepared according to instructions for genomic DNA sample preparation kit (Illumina, San Diego, CA). The library concentration was measured on a Nanodrop spectrophotometer (NanoDrop products, Wilmington, DE), and 4 pM of adaptor-ligated DNA was hybridized to the flow cell. DNA clusters were generated using the Illumina cluster station, followed by 36 cycles of sequencing on the Illumina Genome Analyzer, in accordance with the manufacturer’s protocols. Two technical replicate libraries were constructed for each DNA-Seq sample. Two libraries were prepared from two biological replicates of each RNA material (RNAi or mock treated).

ChIP For ChIP [40], 5–106106 S2 cells were fixed with 1% formaldehyde in tissue culture media for 10 min at room temperature. Glycine was added to a final concentration of 0.125 M to stop cross-linking. After 5 min of additional incubation and two washes with ice-cold PBS, cells were collected and resuspended in cell lysis buffer (5 mM PH 8.0 PIPES buffer, 85 mM KCl, 0.5% Nonidet P40, and protease inhibitors cocktail from Roche, Basel, Switzerland) for 10 min and then resuspended in nuclei lysis buffer (50 mM PH 8.1 Tris.HCl, 10 mM EDTA, 1% SDS and protease inhibitors) for 20 min at 4uC. The nuclear extract was sheared to 200–1,000 bp by sonication on ice for 8 min (pulsed 8 times for 30 s with 30 s intervals using a Misonix Sonicator 3000; Misonix, Inc. Farmingdale, NY). The chromatin solution was then clarified by centrifugation at 14,000 rpm for 10 min at 4uC. Five ul anti-H4AcK16 (Millipore, Billerica, MA) was incubated with the chromatin for 2 h and then was bound to protein A agarose beads at 4uC overnight. The beads were washed three times with 0.1% SDS, 1% Trition, 2 mM EDTA, 20 mM PH 8.0 Tris, and 150 mM NaCl; three times with 0.1% SDS, 1% Trition, 2 mM EDTA, 20 mM PH 8.0 Tris, and 500 mM NaCl; and twice with 10 mM PH 8.1 Tris, 1 mM EDTA, 0.25 M LiCl, 1% NP40, and 1% sodium deoxycholate. The immunoprecipitated DNA was eluted from the beads in 0.1 M NaHCO3 and 1% SDS and incubated at 65uC overnight to reverse cross-linking. DNA was purified by phenol-chloroform extraction and ethanol precipitation. The precipitated DNA for Chromatin immunoprecipitation was amplified using a Ligation-mediated PCR (LM-

RNAi dsRNA for RNAi treatment [38] was produced by in vitro transcription of a PCR generated DNA template from Drosophila genomic DNA containing the T7 promoter sequence on both ends. Target sequences were scanned to exclude any complete 19 mer homology to other genes [39]. The dsRNAs were generated using the MEGAscript T7 kit (Ambion, Austin, TX) and purified using RNAeasy kit (Qiagen, Valencia, CA). Two different primer sets were used for each target gene, and the one with better RNAi efficiency was used for downstream experiments. The selected primer sequences for generation of msl2 dsRNA template by PCR were as follows: forward, 59-taatacgactcactatagggTTGCTCCGACTTCAAGACCT-39, and reverse, 59-taatacgactcactatagggGCATCACGTAGGAGACAGCA-39 and the selected primer sequences for generation of mof dsRNA template were as follows: forward, 59-taatacgactcactatagggGACGGTCATCACAACAGG-

Figure 6. Characterization of dose-response curves. (A, C) Median expression RPKM values plotted against the DNA copy for X chromosome and autosome genes in RNAi and mock treated S2 cells based on absolute (A) or scaled (C) data. Fitted trend lines for the X chromosome (red) and autosomes (black) following mock (solid), msl2 (dashed), and mof (dotted) RNAi treatment are indicated. (B) Boxplots and table showing the distribution of expression ratios among different copy numbers. Expression fold change values were calculated based on real median RPKM values (bold) or projected expression values. Asterisks indicate significant variation for the expression fold change between X chromosome and autosome genes at an equivalent dose in RNAi cells (p,1022). doi:10.1371/journal.pbio.1000320.g006

PLoS Biology | www.plosbiology.org

9

February 2010 | Volume 8 | Issue 2 | e1000320


Expression in Aneuploid Cells

PCR) protocol from FlyChip [41]. ChIP was performed on triplicate biological samples.

5.12 annotation (Oct. 2008) and calculated the total number of reads of all unique exons per kb of total length of unique exons per million mapped reads (RPKM) for each annotated gene. The RPKM calculation was done for individual RNA-Seq libraries separately, and then RPKM values were averaged for biological replicates (r2 = 0.98 between replicates). Non-expressed genes are not useful for ratiometric analysis and these were therefore excluded. We used RPKM values for intergenic regions to determine expression thresholds. For intergenic regions, the RPKM values were calculated for total number of reads between adjacent gene model pairs. Only 5% of intergenic regions in S2 cells have a RPKM value greater than or equal to 4. Therefore, we called genes with RPKM values no less than 4 in S2 cells as expressed with an estimated type I error rate of 5%. All microarray data (except CGH) and statistical tests were processed and analyzed in R/Bioconductor [46]. For the ChIPchip experiments, we used quantile normalization based on the input channel. The distributions of raw and normalized intensities were checked to make sure that normalization was appropriate (i.e., that the skew was maintained). We used the average ChIP/ input ratio from biological replicates (r2 = 0.40–0.54 between replicates). The ChIP/input ratios in RNAi and mock treated cells were used for K-means clustering analysis with 3 nodes using Euclidean similarity metric and genes on X chromosome and autosomes were clustered separately using Cluster3.0 and then visualized using Tree-View [47]. For expression profiling, we normalized using loess within each 12-plex and quantile between 12-plexes. Average probeset log2 intensities were calculated in both channels for each gene. Correlations between array intensities and RPKM values were estimated by Spearman’s rank correlation coefficient. The comparisons for the distributions of DNA densities or expression values among different chromosomes and different copy numbers were performed using two sample Kolmogorov-Smirnov tests (KS tests). Normalization is inherently problematic when a large fraction of the genome changes expression, as in the RNAi experiments. Given that 20% of the genome is encoded on the X chromosome (X) and 80% is encoded on autosomes (A), and that one samples transcripts from a total mRNA pool to generate an expression profile, and that X chromosome expression is reduced by half and autosome expression does not change, then autosomal transcripts must be over-sampled in the experiment. Conversely, if the autosome expression is doubled, then X chromosome transcripts must be under-sampled. While it is imprudent to formally state the precise contribution of X chromosome expression changes and autosomal expression changes due to MSL-mediated dosage compensation, we can determine which makes the larger contribution based on the RPKM, total mRNA, and cell count measurements. Using this information, we calculated the loglikelihood value for two hypotheses:

Microarrays Six hundred ng of amplified DNA (ChIP enriched DNA or input DNA) were labeled using 6ug Cy3- or Cy5-labeled random nonamers (Trilink Biosciences, San Diego, CA) with 50U Klenow (New England Biolabs, Ipswich, MA) and 2 mM dNTPs. The labeled DNA was purified and hybridized to FlyGEM microarrays [42]. Arrays were scanned on an Axon 4000B scanner (Molecular Devices Corporation, Sunnyvale, CA) and signal was extracted with GenePix v.5.1 image acquisition software (Molecular Devices Corporation). Two hundred ng aliquots of the same extracted mRNA used for RNA-Seq were labeled as described [42] and were hybridized to NimbleGen custom 12 plex microarrays at 42uC using a MAUI hybridization station (BioMicro Systems, Salt Lake City, UT) according to manufacturer instructions (NimbleGen Systems, Madison, WI). Arrays were scanned on an Axon 4200AL scanner (Molecular Devices Corporation, Sunnyvale, CA) and data were captured using NimbleScan 2.1 (NimbleGen Systems, Madison, WI).

Western Analysis Cell lysates were prepared from cells 4 d after dsRNA or mock treatment by boiling for 5 min in NuPAGE LDS sample buffer (Invitrogen, Carlsbad, CA). Samples were run by SDS-PAGE using a 4%–12% Bis-Tris gel (Invitrogen, Carlsbad, CA) and transferred to PVDF membrane. Blots were incubated with antiMSL antibody (1:500), anti-MOF antibody (1:3,000, gifts of M. Kuroda), or anti-a tubulin antibody (1:10,000, Sigma, St. Louis, MO) and then with HRP-secondary antibodies in PBS buffer with 0.1% Tween 20. Protein signals were detected by Pierce SuperSignal West Dura extended Duration Substrate (Thermo Fisher Scientific, Rockford, IL). Images were captured using a Fuji LAS3000 Imager and quantified using the Image Gauge software (Fuji Film, Tokyo, Japan).

Data Processing Both DNA-Seq and RNA-Seq sequence reads were compiled using a manufacturer-provided computational pipeline (Version 0.3) including the Firecrest and Bustard applications [36]. Sequence reads were then aligned with the Drosophila melanogaster assembly (BDGP Release 5, dm3) [6,43] using Eland. Only uniquely mapped reads with less than two mismatches were used. For DNA-Seq data, we counted the number of reads in the nonoverlapped 1 kb region along each chromosome using all sequenced reads from two technical DNA-Seq libraries and calculated the read density by the number of unique mapped reads per kb per million mapped reads (RPKM) [37]. The breakpoint positions of aneuploid segments were identified using the Bayesian analysis of change point (bcp) package from R [44]. Because some reads mapped to multiple positions in the genome and thus inappropriately lower the deduced copy number in regions with low sequence complexity, we removed all the 1 kb windows with RPKM lower than 2 (RPKM value of one copy = 2.29) prior to change point analysis. Breakpoints with posterior possibility .0.95 were used. Copy number was assigned to segments based on the fold between average segments RPKM value between breakpoints (2.2961.15 RPKM = 1 copy, 4.5861.15 RPKM = 2 copy, etc.). Genes spanning two segments were not used in gene expression analysis. For RNA-Seq data, we counted the number of unique mapped reads within all unique exons of Drosophila Flybase [45] Release PLoS Biology | www.plosbiology.org

H0 : ARNAi ~AWT ,XRNAi ~1=2XWT H1 : ARNAi ~2AWT ,XRNAi ~XWT Here hypothesis H0 states that the expression of autosomes (A) remains the same and the expression of the X chromosome (X) decreases by half after RNAi treatment. Hypothesis H1 states that the expression of autosomes (A) is increased by 2-fold after the RNAi treatment and the expression of X chromosome (X) remains the same. The expected sum of expression in the RNAi treated cells is 90% of wild type for H0 and 180% for H1. E is the measured mRNA per cell. In the duplicate RNA-Seq experiments, 10

February 2010 | Volume 8 | Issue 2 | e1000320


Expression in Aneuploid Cells

we obtained mRNA yields of 0.16 pg and 0.17 pg/cell from mock treated, 0.15 pg and 0.19 pg/cell from Msl2 knockdown, and 0.14 pg and 0.20 pg/cell from Mof knockdown S2 cells.

log likelihood H0 ~

n X

{

i~1

1 2



EWTi {E WT aWT

Supporting Information Figure S1 Copy number determination by Bayesian Change Point Analysis of DNA-Seq read density. Found at: doi:10.1371/journal.pbio.1000320.s001 (1.12 MB PDF)

2

Figure S2 DNA-Seq densities of each copy number defined by DNA-Seq copy number calls or CGH copy number calls. Found at: doi:10.1371/journal.pbio.1000320.s002 (0.07 MB PDF)

 2  2 !! EMofi {0:9E WT EMsl2 {0:9E WT z z aMsl2 aMof

RNA-Seq and array expression profiling. Found at: doi:10.1371/journal.pbio.1000320.s003 (2.01 MB PDF)

Figure S3

log likelihood H1 ~

n  X i~1



{

1 2



EWTi {E WT aWT

EMsl2i {1:8E WT z aMsl2

2

2

Table S1 Copy number segments based on DNA-Seq. Found at: doi:10.1371/journal.pbio.1000320.s004 (0.04 MB XLS)

 2 !! EMofi {1:8E WT z aMof

Table S2 Copy number validation by DNA-Seq and CGH. Found at: doi:10.1371/journal.pbio.1000320.s005 (0.09 MB DOC)

The log-likelihood of H0 – the log-likelihood of H1 = 26.4 suggests that X chromosome expression change contributes more than autosomal expression change to the observed measurements of expression in wide type cells relative to RNAi treated cells.

Table S3 The number of genes in each copy number category. Found at: doi:10.1371/journal.pbio.1000320.s006 (0.03 MB DOC)

Comparative Genomic Hybridization (CGH)

Acknowledgments

DNA was isolated from Drosophila S2-DRSC cells obtained from the Drosophila Genomics Resource Center (#181, Bloomington, IN) and from w1118 0–2 h embryos as described previously [48]. The isolated cell line and embryonic DNA were labeled with either Cy5 or Cy3 conjugated dUTP and subsequently hybridized to a custom Agilent genomic tiling array (GEO; GPL7787). Changes in copy number along each of the Drosophila chromosome arms were detected by a dynamic programming algorithm which divided each arm into the optimal number of copy number segments [49].

We thank members of Oliver laboratory and Carson Chow for helpful discussion and comments on the manuscript, Mathias Beller for help with cell culture and RNAi experiments, David Clark for help with ChIP experiments, Mitzi Kuroda for anti-Msl2 and anti-Mof reagents, and the NIDDK genomics core for assistance with Illumina sequencing.

Author Contributions The author(s) have made the following declarations about their contributions: Conceived and designed the experiments: YZ SKP ES DMM BO. Performed the experiments: YZ SKP. Analyzed the data: YZ JHM SKP VP DMM BO. Contributed reagents/materials/analysis tools: JHM ES. Wrote the paper: YZ BO.

Accession Numbers All Seq and array data sets are available at GEO under accession number GSE16344. The CGH data set is available at modENCODE submission ID 596.

References 13. Gupta V, Parisi M, Sturgill D, Nuttall R, Doctolero M, et al. (2006) Global analysis of X-chromosome dosage compensation. J Biol 5: 3. 14. Kelley RL, Solovyeva I, Lyman LM, Richman R, Solovyev V, et al. (1995) Expression of msl-2 causes assembly of dosage compensation regulators on the X chromosomes and female lethality in Drosophila. Cell 81: 867–877. 15. Akhtar A, Becker PB (2000) Activation of transcription through histone H4 acetylation by MOF, an acetyltransferase essential for dosage compensation in Drosophila. Mol Cell 5: 367–375. 16. Kind J, Vaquerizas JM, Gebhardt P, Gentzel M, Luscombe NM, et al. (2008) Genome-wide analysis reveals MOF as a key regulator of dosage compensation and gene expression in Drosophila. Cell 133: 813–828. 17. Ruthenburg AJ, Li H, Patel DJ, Allis CD (2007) Multivalent engagement of chromatin modifications by linked binding modules. Nat Rev Mol Cell Biol 8: 983–994. 18. Bhadra MP, Bhadra U, Kundu J, Birchler JA (2005) Gene expression analysis of the function of the male-specific lethal complex in Drosophila. Genetics 169: 2061–2074. 19. Stenberg P, Lundberg LE, Johansson AM, Ryden P, Svensson MJ, et al. (2009) Buffering of segmental and chromosomal aneuploidies in Drosophila melanogaster. PLoS Genet 5: e1000465. doi:10.1371/journal.pgen.1000465. 20. Birchler JA, Hiebert JC, Paigen K (1990) Analysis of autosomal dosage compensation involving the alcohol dehydrogenase locus in Drosophila melanogaster. Genetics 124: 679–686. 21. Devlin RH, Holm DG, Grigliatti TA (1982) Autosomal dosage compensation Drosophila melanogaster strains trisomic for the left arm of chromosome 2. Proc Natl Acad Sci U S A 79: 1200–1204. 22. Heylighen F, Joslyn C (2001) Cybernetics and second-order cybernetics. In: Meyers RA, ed. Encyclopedia of physical science & technology (3rd ed). New York: Academic Press. pp 1–23.

1. Henrichsen CN, Chaignat E, Reymond A (2009) Copy number variants, diseases and gene expression. Hum Mol Genet 18: R1–R8. 2. Payer B, Lee JT (2008) X chromosome dosage compensation: how mammals keep the balance. Annu Rev Genet 42: 733–772. 3. Vanneste E, Voet T, Le Caignec C, Ampe M, Konings P, et al. (2009) Chromosome instability is common in human cleavage-stage embryos. Nat Med 15: 577–583. 4. Veitia RA, Bottani S, Birchler JA (2008) Cellular reactions to gene dosage imbalance: genomic, transcriptomic and proteomic effects. Trends Genet 24: 390–397. 5. Lindsley DL, Sandler L, Baker BS, Carpenter AT, Denell RE, et al. (1972) Segmental aneuploidy and the genetic gross structure of the Drosophila genome. Genetics 71: 157–184. 6. Hoskins RA, Carlson JW, Kennedy C, Acevedo D, Evans-Holm M, et al. (2007) Sequence finishing and mapping of Drosophila melanogaster heterochromatin. Science 316: 1625–1628. 7. Weaver BA, Cleveland DW (2006) Does aneuploidy cause cancer? Curr Opin Cell Biol 18: 658–667. 8. Cherry S (2008) Genomic RNAi screening in Drosophila S2 cells: what have we learned about host-pathogen interactions? Curr Opin Microbiol 11: 262–270. 9. Schneider I (1972) Cell lines derived from late embryonic stages of Drosophila melanogaster. J Embryol Exp Morphol 27: 353–365. 10. Copps K, Richman R, Lyman LM, Chang KA, Rampersad-Ammons J, et al. (1998) Complex formation by the Drosophila MSL proteins: role of the MSL2 RING finger in protein complex assembly. Embo J 17: 5409–5417. 11. Lucchesi JC, Kelly WG, Panning B (2005) Chromatin remodeling in dosage compensation. Annu Rev Genet 39: 615–651. 12. Belote JM, Lucchesi JC (1980) Control of X chromosome transcription by the maleless gene in Drosophila. Nature 285: 573–575.

PLoS Biology | www.plosbiology.org

11

February 2010 | Volume 8 | Issue 2 | e1000320


Expression in Aneuploid Cells

36. Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, et al. (2008) Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456: 53–59. 37. Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods 5: 621–628. 38. Caplen NJ, Fleenor J, Fire A, Morgan RA (2000) dsRNA-mediated gene silencing in cultured Drosophila cells: a tissue culture model for the analysis of RNA interference. Gene 252: 95–105. 39. Kulkarni MM, Booker M, Silver SJ, Friedman A, Hong P, et al. (2006) Evidence of off-target effects associated with long dsRNAs in Drosophila melanogaster cellbased assays. Nat Methods 3: 833–838. 40. Ren B, Robert F, Wyrick JJ, Aparicio O, Jennings EG, et al. (2000) Genomewide location and function of DNA binding proteins. Science 290: 2306–2309. 41. Birch-Machin I, Gao S, Huen D, McGirr R, White RA, et al. (2005) Genomic analysis of heat-shock factor targets in Drosophila. Genome Biol 6: R63. 42. Johnston R, Wang B, Nuttall R, Doctolero M, Edwards P, et al. (2004) FlyGEM, a full transcriptome array platform for the Drosophila community. Genome Biol 5: R19. 43. Adams MD, Celniker SE, Holt RA, Evans CA, Gocayne JD, et al. (2000) The genome sequence of Drosophila melanogaster. Science 287: 2185–2195. 44. Erdman C, Emerson JW (2008) A fast Bayesian change point analysis for the segmentation of microarray data. Bioinformatics 24: 2143–2148. 45. Wilson RJ, Goodman JL, Strelets VB (2008) FlyBase: integration and improvements to query tools. Nucleic Acids Res 36: D588–D593. 46. Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, et al. (2004) Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 5: R80. 47. Eisen MB, Spellman PT, Brown PO, Botstein D (1998) Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A 95: 14863–14868. 48. MacAlpine DM, Rodriguez HK, Bell SP (2004) Coordination of replication and transcription along a Drosophila chromosome. Genes Dev 18: 3094–3105. 49. Huber W, Toedling J, Steinmetz LM (2006) Transcript mapping with highdensity oligonucleotide tiling arrays. Bioinformatics 22: 1963–1970.

23. Darzacq X, Shav-Tal Y, de Turris V, Brody Y, Shenoy SM, et al. (2007) In vivo dynamics of RNA polymerase II transcription. Nat Struct Mol Biol 14: 796–806. 24. Kacser H, Burns JA (1981) The molecular basis of dominance. Genetics 97: 639–666. 25. Ptashne M (2004) A genetic switch: phage lambda revisited. Cold Spring Harbor, NY: Cold Spring Harbor Laboratory Press. xiv, 154 p. 26. Mileyko Y, Joh RI, Weitz JS (2008) Small-scale copy number variation and large-scale changes in gene expression. Proc Natl Acad Sci U S A 105: 16659–16664. 27. Franke A, Dernburg A, Bashaw GJ, Baker BS (1996) Evidence that MSLmediated dosage compensation in Drosophila begins at blastoderm. Development 122: 2751–2760. 28. Alekseyenko AA, Larschan E, Lai WR, Park PJ, Kuroda MI (2006) Highresolution ChIP-chip analysis reveals that the Drosophila MSL complex selectively identifies active genes on the male X chromosome. Genes Dev 20: 848–857. 29. Kind J, Akhtar A (2007) Cotranscriptional recruitment of the dosage compensation complex to X-linked target genes. Genes Dev 21: 2030–2040. 30. Gilfillan GD, Konig C, Dahlsveen IK, Prakoura N, Straub T, et al. (2007) Cumulative contributions of weak DNA determinants to targeting the Drosophila dosage compensation complex. Nucleic Acids Res 35: 3561–3572. 31. Straub T, Grimaud C, Gilfillan GD, Mitterweger A, Becker PB (2008) The chromosomal high-affinity binding sites for the Drosophila dosage compensation complex. PLoS Genet 4: e1000302. doi:10.1371/journal.pgen.1000302. 32. Sturgill D, Zhang Y, Parisi M, Oliver B (2007) Demasculinization of X chromosomes in the Drosophila genus. Nature 450: 238–241. 33. Zhang Y, Oliver B (2007) Dosage compensation goes global. Curr Opin Genet Dev 17: 113–120. 34. Altug-Teber O, Bonin M, Walter M, Mau-Holzmann UA, Dufke A, et al. (2007) Specific transcriptional changes in human fetuses with autosomal trisomies. Cytogenet Genome Res 119: 171–184. 35. Laffaire J, Rivals I, Dauphinot L, Pasteau F, Wehrle R, et al. (2009) Gene expression signature of cerebellar hypoplasia in a mouse model of Down syndrome during postnatal development. BMC Genomics 10: 138.

PLoS Biology | www.plosbiology.org

12

February 2010 | Volume 8 | Issue 2 | e1000320


2-D Structure of the A Region of Xist RNA and Its Implication for PRC2 Association Sylvain Maenner1, Magali Blaud1, Laetitia Fouillen2, Anne Savoye1, Virginie Marchand1¤, Agne`s Dubois3, Sarah Sanglier-Cianfe´rani2, Alain Van Dorsselaer2, Philippe Clerc3, Philip Avner3, Athanase Visvikis1, Christiane Branlant1* 1 AREMS, Nancy Universite´, UMR 7214 CNRS-UHP 1, Faculte´ des Sciences et Techniques, BP 70239, Vandoeuvre-le`s-Nancy, France, 2 Laboratoire de Spectrome´trie de Masse BioOrganique, Institut Pluridisciplinaire Hubert Curien, De´partement des Sciences Analytiques, Universite´ de Strasbourg, CNRS UMR 7178, ECPM, Strasbourg, France, 3 Ge´ne´tique Mole´culaire Murine, CNRS2578, Institut Pasteur, Paris, France

Abstract In placental mammals, inactivation of one of the X chromosomes in female cells ensures sex chromosome dosage compensation. The 17 kb non-coding Xist RNA is crucial to this process and accumulates on the future inactive X chromosome. The most conserved Xist RNA region, the A region, contains eight or nine repeats separated by U-rich spacers. It is implicated in the recruitment of late inactivated X genes to the silencing compartment and likely in the recruitment of complex PRC2. Little is known about the structure of the A region and more generally about Xist RNA structure. Knowledge of its structure is restricted to an NMR study of a single A repeat element. Our study is the first experimental analysis of the structure of the entire A region in solution. By the use of chemical and enzymatic probes and FRET experiments, using oligonucleotides carrying fluorescent dyes, we resolved problems linked to sequence redundancies and established a 2-D structure for the A region that contains two long stem-loop structures each including four repeats. Interactions formed between repeats and between repeats and spacers stabilize these structures. Conservation of the spacer terminal sequences allows formation of such structures in all sequenced Xist RNAs. By combination of RNP affinity chromatography, immunoprecipitation assays, mass spectrometry, and Western blot analysis, we demonstrate that the A region can associate with components of the PRC2 complex in mouse ES cell nuclear extracts. Whilst a single four-repeat motif is able to associate with components of this complex, recruitment of Suz12 is clearly more efficient when the entire A region is present. Our data with their emphasis on the importance of inter-repeat pairing change fundamentally our conception of the 2-D structure of the A region of Xist RNA and support its possible implication in recruitment of the PRC2 complex. Citation: Maenner S, Blaud M, Fouillen L, Savoye A, Marchand V, et al. (2010) 2-D Structure of the A Region of Xist RNA and Its Implication for PRC2 Association. PLoS Biol 8(1): e1000276. doi:10.1371/journal.pbio.1000276 Academic Editor: Kathleen Hall, Washington University School of Medicine, United States of America Received August 3, 2009; Accepted November 25, 2009; Published January 5, 2010 Copyright: ß 2010 Maenner et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: This study was supported by the French Centre National de la Recherche Scientifique (CNRS, http://www.cnrs.fr/), the French Ministry of ‘‘Enseignement Superieur et la Recherche’’ (http://www.enseignementsup-recherche.gouv.fr/), the French National Agency for Research (ANR, http://www. agence-nationale-recherche.fr/) (contract Number ANR 07 BLAN 0047 102), Institut Pasteur, European Union contract from Epigenome NoE, Region Alsace, and AliX. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. Abbreviations: CMCT, N-cyclohexyl-N0-(2-morpholinoethyl)-carbodiimide metho-p-toluolsulfonate; DMS, dimethylsulfate; Eed, embryonic ectoderm development; ES cell, embryonic stem cell; Ezh2, enhancer of zeste homolog 2; FRET, Fo¨rster resonance energy transfer; HIV, human immunodeficiency virus; HOTAIR, HOX antisense intergenic RNA; Lnx3, ligand of numb-protein X 3; NMR, nuclear magnetic resonance; PRC2, polycomb group protein 2; PTB, polypyrimidine tract-binding protein; Rbap46–48, retinoblastoma-binding protein p46–p48; RNP, ribonucleoprotein particle; RRM, RNA recognition motif; Suz12, suppressor of zeste 12 protein homolog; XCI, X chromosome inactivation; Xist, X (inactive)-specific transcript * E-mail: christiane.branlant@maem.uhp-nancy.fr ¤ Current address: Developmental Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany

coding RNA (17 kb in length in the mouse), which is capped, spliced, and polyadenylated. Little is known about its structure and mechanism of action. The Xist gene has a complex origin. It includes degenerated pieces of an ancient protein gene Lnx3 as well as genomic repeat elements derived at least in part from transposon integration events [6,7]. The most conserved Xist RNA regions correspond to repeat elements (denoted A to E in mouse [8]), which are organized as tandem arrays. The A region (positions 292 to 713 in mouse, accession no. gi|37704378|ref|NR_001463.2| [2], and 350 to 770 in human, accession no. gi|340393|gb|M97168.1| [5]) is the most highly conserved of the repeat regions and is critical for initiation of XCI. The observation that female mouse

Introduction In mammals, the transcriptional silencing of one of the two X chromosomes in female cells (X chromosome inactivation, XCI) ensures sex chromosome dosage compensation [1]. Once acquired early in development, the inactivated state is faithfully inherited through successive cell divisions. XCI initiation is associated with increased Xist RNA transcription. Whilst first retained near its transcription site, Xist RNA then spreads along the entire X chromosome from which it has been transcribed [2–5] whilst, a series of epigenetic marks, which include the repressive histone modifications H3K27me3, H3K9me3, are recruited to the presumptive inactive X chromosome. Xist RNA is a long nonPLoS Biology | www.plosbiology.org

1

January 2010 | Volume 8 | Issue 1 | e1000276


Xist A Region Structure and PRC2 Association

structures. Such inter-repeat interactions appear to be required for the binding of the various components of the PRC2 complex. We identified the minimal number of repeats necessary for such binding. The implications of our results within the wider context of X-inactivation and of the XCI mechanism(s) underlying silencing are discussed.

Author Summary In placental mammal females, Xist RNA is crucial for inactivation of one of the two X chromosomes in order to maintain proper X chromosome dosage. It is known that the conserved A region of Xist RNA, which contains eight or nine repeated elements, plays an essential role in this process, however, little is known about its structure and mechanism of action. By using chemical and enzymatic probes, as well as FRET experiments, we performed the first experimental analysis of the solution structure of the entire Xist A region. Both mouse and human A regions were found to form two long stem-loop structures each containing four repeats. In contrast to previous predictions, interactions take place both between repeats and between repeats and spacers. Affinity-purification of RNAprotein complexes formed by incubation of RNA in mouse ES cell nuclear extract, followed by mass spectrometry and antibody-based analyses of their protein contents, showed that the isolated 4-repeat structures from the A region can recruit components of the PRC2 complex that is needed for X chromosome inactivation. However, association of one component of this complex, Suz12, was more efficient when the entire A region was used.

Results Probing of Mouse and Human A Region 2-D Structures We analysed in parallel both the entire mouse and human A regions, as the sequence divergence of the inter-repeat linking sequence between mouse and human was expected to provide insight into how the spacer regions might influence the A repeat structure. The specific primers for the two A regions are listed in Table S1. To test if the A region interacts with neighbouring Xist RNA sequences, an RNA that contained only the mouse A region (positions 277 to 760 in mouse Xist RNA) and a larger RNA including the sequence extending from positions 1 to 1137 were studied in parallel by limited enzymatic digestion. Very similar digestion profiles (Figure 1A and B) were obtained for the two RNAs when digestions were performed on the T7 RNA transcripts after folding under the conditions outlined in Materials and Methods. We conclude that the A region probably folds on itself without major interaction with other upstream and downstream Xist sequences. Hence, our subsequent analyses of the 2-D structure of the Xist A region were carried out in the absence of flanking sequences (positions 227 to 760 and 330 to 796 for the mouse and human A regions, respectively). Each enzymatic digestion and chemical modification assay was carried out in duplicate using different transcript preparations and each extension analysis repeated 2 or 3 times for each primer. Representative examples of the primer extension analyses are provided in Figure 1 and Figure S1A–S1E.

embryos carrying a mutated XistDA gene inherited from males are selectively lost during embryogenesis underlines the importance of this element [9]. Recent data have shown that an early event in silencing is the formation of a Xist RNA compartment and that the A region whilst not necessary for formation of this compartment is needed for relocation of X linked genes into this territory [10]. Over-expression of a XistDA RNA in transgenic mouse ES cells indicates that the A region whilst not necessary for Xist coating is implicated in the recruitment of the PRC2 complex [11–16]. The PRC2 complex contains the Suz12, Eed, Ezh2, and Rbap46–48 proteins [17,18]. Eed and Suz12 have been proposed to bind nucleic acids [19,20], whereas Rbap46–48 may interact with nucleosome protein components [17]. Lysine 27 trimethylation of histone H3 is catalysed by Ezh2 [12,14] and both Eed and Suz12 are required for this activity [20]. Recently a short 1,600 nucleotide-long RNA which contains the A region at its 59 extremity was suggested to be expressed early in XCI initiation and to bind the PRC2 complex [21]. Since the function of Xist RNA is expected to depend on its 2-D structure, studies aimed at establishing the 2-D structure of the Xist A region have considerable interest. Based on nucleotide sequence of the A region and computer prediction, Wutz and colleagues have proposed that each repeat forms two short stemloop structures [11]. Recent NMR analysis confirmed that one of these stem-loop structures can be formed in vitro by an RNA molecule bearing a single copy of the mouse repeat A sequences [22]. In Xist RNA, the repeat sequences are, however, separated by long spacer regions (21 to 48 nt long for mouse). Since current models fail to take account of this sequence complexity, an experimental analysis of the entire A region was thought likely to provide valuable information on the structure of the A region. As conventional probing experiments are, however, hindered by the presence of the repeated sequences and long U tracks, we applied a combined approach exploiting both chemical and enzymatic probing of RNA structure in solution and FRET experiments using fluorescent oligonucleotide probes complementary to different parts of the A region. Using this dual approach, we could show that repeats in the A region interact with each other to form long irregular stem-loop PLoS Biology | www.plosbiology.org

M-Fold Assisted Modelling of the Mouse A Region 2-D Structure The structure proposed by Wutz and colleagues, in which each of the repeats fold into a double stem-loop structure, could not explain the numerous V1 RNase cleavages that we observed with both the mouse and human A regions (Figure 2) [11]. We explored the possibility that each repeat folds into a unique longer stem-loop structure. Such folding was similarly unable to explain V1 RNase cleavages (Figure S2). We conclude that the 2-D structure may involve interactions between repeats and spacers and inter-repeat interactions. There is, however, a multitude of potential ways for duplex formation between repeats (Figures 3–5, models 1–3). Our design of the putative structure was orientated by the detection of six successive strong V1 RNase cleavages in the central poly A sequence (positions 550 to 555), suggesting the involvement of this segment in a helical structure. The strong modification by DMS of a sequence immediately downstream (positions 555 to 561) was an indication for a single-stranded state. One possible explanation for these data was the formation of a central stem-loop structure called SLS2M with a U track on one strand and an A track on the other strand (Figure 5). Formation of this central stem-loop structure was subsequently imposed as a constraint when exploring the possible folding of the mouse A region. This excluded structures in which two successive repeats would interact with each other (Model 1, Figure 3), since in this case, the entire poly A region would interact with the poly U track located upstream of repeat 3, which was not in agreement with the probing data (Figure 3). Another possible structure involved formation of an interaction between the 59 and 39 halves of the A region. This would generate a very long irregular 2

January 2010 | Volume 8 | Issue 1 | e1000276


Xist A Region Structure and PRC2 Association

Figure 1. Probing of mouse Xist A region RNA structure alone and inside 59-terminal region. The two RNAs (A. for 59 terminal 1137-nt long RNA, B. for A region) were in vitro transcribed and renatured as described in Material and Methods, before being subjected to limited digestion with T1, T2, or V1 RNases under the conditions described in Materials and Methods. Extension analyses were performed using oligonucleotide 3866 (Table S1) as the primer. The resulting cDNAs were fractionated by electrophoresis on 7% denaturing polyacrylamide gel. Lanes U, G, C, and A correspond to the sequencing ladder obtained with the same primer. Lanes marked by Contr corresponds to primer extension analysis of undigested RNA transcripts. Nucleotide numbering on the left-hand side of the autoradiogram takes the first residue of mouse Xist RNA as residue 1. The sequences corresponding to repeats 3 and 4 are indicated by vertical bars on the right-hand side of the autoradiograms. doi:10.1371/journal.pbio.1000276.g001

stem-loop structure with an A rich terminal loop (Model 2, Figure 4). An alternative structure involved the folding on themselves of the 59 and 39 parts of the A region, with SLS2M in between (Model 3, Figure 5). Several other alternative pairings of the repeats were also explored—none fitted the chemical and enzymatic data perfectly. The notion of independent folding of the 59 part of the A region (positions 318 to 521 in mouse RNA) was supported by M-fold analysis of this segment, which identified a highly stable long irregular stem-loop structure, SLS1M (DG = 241.96 kcal/mol at 0uC in 3 M NaCl), in which repeat 1 interacts with repeat 4 and repeat 2 interacts with repeat 3. It is the most thermodynamically stable structure proposed for this 59 segment and was predicted irrespective of whether the experimental data were introduced as a constraint in the M-fold search. In SLS1M, each repeat interacts both with another repeat and with a spacer segment, increasing the stability of the overall structure. Similarly, M-fold analysis of the 39 part of the mouse A region suggested that repeat 5 interacts with repeat 8 and a spacer region and repeat 6 with repeat 7 and a spacer region. The resulting SLS3M structure was predicted as the most favourable structure by M-fold when the experimental data were incorporated as a constraint. The overall predicted three stem-loop structure (Model 3) has a low calculated free energy PLoS Biology | www.plosbiology.org

(277.76 kcal/mol) and has a better fit to the experimental data than Models 1 and 2 (Figures 3 and 4), suggesting that, in solution, Model 3 is the most likely structure among the numerous possibilities.

The Possibility to Form Structure 3 Is Phylogenetically Conserved If a structure has biological relevance, it is generally conserved throughout evolution. Therefore, we tested whether the most favourable structures identified for the mouse A region were relevant to the human A region in solution. The sequence of the human A region differs from that of mouse by the presence of an additional repeat 5 and the absence of a long central polyA region. Experimental data (Figures 6 and 7 and Figure S3A–E) suggested that the central repeat 5 forms a central stem-loop structure (SLS2H). Based on this, structures similar to mouse Models 2 and 3 could be proposed for the human A region which involve either a long irregular stem-loop structure including all the repeats (Model 2) or a three stem-loop structure (Model 3) with repeats 1 to 4 forming a first stem-loop structure (SLS1H), repeat 5 folded alone in a second stem-loop (SLS2H) and repeats 6 to 9 involved in a third stem-loop structure (SLS3H). As for the mouse A region: 3

January 2010 | Volume 8 | Issue 1 | e1000276


Xist A Region Structure and PRC2 Association

Figure 2. Representation of experimental data on the previously proposed 2-D structure of the Xist A region. Each of the seven repeats as well as the eighth half repeat from the mouse A region was folded according to the previously predicted two stem-loop structure [11]. T1, T2, and V1 RNase cleavages were represented by arrows surmounted by circles, triangles, and squares, respectively. Nucleotides modified by DMS or CMCT are circled. Colours of circles and arrows indicate the yields of modifications and cleavages—red, yellow, and green for strong, medium, and low modification or cleavage, respectively. The V1 RNase cleavages and chemical modifications that cannot be explained by the two stem-loop structure models are encircled by blue lanes. doi:10.1371/journal.pbio.1000276.g002

The maintenance of interactions between both spacers and repeats during mammalian evolution of the A region implies that the nucleotide sequences involved in these interactions were either conserved or subjected to compensatory base changes. This was confirmed by the alignment of the mouse, human, orangutan, baboon, lemur, dog, rabbit, cow, and elephant A region sequences (Figure S5). Nucleotide sequence conservation extends out beyond the repeats themselves for the majority of the repeats, allowing formation of the SLS1 and SLS3 structures in all sequenced Xist RNAs (Figure S6).

(i) A 2-D structure in which each repeat interacts with its immediate downstream repeat (repeats 1, 3, 6, and 8 with repeats 2, 4, 7, and 9, respectively, Model 1) was not supported by the probing data (segment 688 to 696) (Figure S4); (ii) M-fold analysis of the 59 portion of the human A region (positions 370 to 530) either with or without the experimental data as a constraint identified SLS1H as the most stable structure (DG = 242.70 kcal/ mol) (Figure 7); (iii) SLS3H was retained as the most stable structure for the 39 part of the A region, when the experimental data were added as a constraint to an M-fold search; and (iv) the 3 stem-loop structure corresponding to Model 3 (DG = 286.6 kcal/ mol) had the best fit with probing data compared to the other 2-D models. Further support for Model 3 was provided by our observation of identical patterns of enzymatic cleavage for the entire human A region and for the isolated SLS1H portion (Figure 6). PLoS Biology | www.plosbiology.org

FRET Experiments Bring Additional Data in Favour of Model 3 Three oligonucleotide pairs (P1–P5, P2–P4, and P6–P7) were retained in order to test Model 3 by FRET experiments (Figure 8). This Model predicts that the P1–P5 and P2–P4 pairs of 4

January 2010 | Volume 8 | Issue 1 | e1000276


Xist A Region Structure and PRC2 Association

Figure 3. Representation of experimental data on the possible 2-D structure 1 of the Xist A region. In model 1, the various stem-loop structures all involve two successive repeats. The repeats are indicated by red lines and are numbered from 1 to 8. Representations of chemical and enzymatic data are as in Figure 2. In each of the three panels, segments in which DMS and/or CMCT modifications were identified are indicated on a schematic drawing of the 2-D structure (full and dot lines for DMS and CMCT, respectively). In segments analyzed by the two chemical reagents, unmodified nucleotides are squared. However, one should take into consideration the fact that G residues are poorly modified by CMCT in the mild conditions that we used. The free energies of each stem-loop structures at 0uC and in 3 M NaCl were calculated with the M-fold software. doi:10.1371/journal.pbio.1000276.g003

PLoS Biology | www.plosbiology.org

5

January 2010 | Volume 8 | Issue 1 | e1000276


Xist A Region Structure and PRC2 Association

Figure 4. Representation of experimental data on the possible 2-D structure 2 of the Xist A region. In model 2, repeats in the 59 half of the A region interact with repeats in the 39 half of this region. Representation of enzymatic cleavages and chemical modifications are as in Figures 2 and 3. doi:10.1371/journal.pbio.1000276.g004

oligonucleotides interact with the single-stranded segments which border the helix formed by repeats 1 and 4, whilst the P6–P7 pair of oligonucleotides should interact with the single-stranded segments bordering the helix formed by repeats 5 and 8. A marked FRET effect would therefore be expected for these three oligonucleotide pairs if the A region was folded as in Model 3. The distance between the fluorophores of these three pairs of oligonucleotides would, on the other hand, be expected to be much larger if region A was folded as in structures 1 or 2. Whilst tertiary structural interactions might decrease the distances, a lower level of FRET would still be expected to be observed for the three pairs of oligonucleotides if the A region was folded according to structures 1 or 2 (Figure 8). The P1 and P6 oligonucleotides bind to two single-stranded segments which flank the helix formed by repeats 1 and 8 in structure 2. A strong FRET effect for P1 and P6 would therefore be expected if the A region was folded according to structure 2. Upon binding to the A region, oligonucleotide P7 partially disrupts the base-pair interactions formed by the central poly A stretch. However, as similar levels of destabilization are expected for the three possible structures, binding of this oligonucleotide was not expected to favour one structure more than the two other ones. The same is true for oligonucleotide P5 that binds to the partner U stretch of the poly A sequence. To monitor the level of FRET obtained for oligonucleotides bordering a helix, we used the short R1–2 transcript containing repeats 1 and 2 and their bordering sequences, which adopt a single unique 2-D structure and the P1–P39 oligonucleotide pair (Figure S7). Other controls exploited the P3–P6 and P3–P5 pairs, which were not expected to be in close proximity in any of the three proposed structures (Figure 8). The oligonucleotide pairs used are shown in Figure 8, along with examples of typical fluorescence intensity spectra recorded in FRET experiments for the P2–P4 and P3–P6 pairs (Figure 8D). High FRET signals in the range of 50% were obtained for the P1/ P5, P2/P4, and P7/P6 oligonucleotide pairs, whilst lower FRET signals were observed for the P1–P6 (35%) and especially the P3– P6 and P3–P5 oligonucleotide pairs (25% and 22%, respectively). This is compatible with a large part of the molecules being folded in solution into structure 3. Folding of a large number of molecules into structure 2 would have led to a strong FRET signal for the P1–P6 pair and lower signals for the five other pairs, which was not observed. The strong FRET effects obtained for the P1/ P5, P2/P4, and P7/P6 oligonucleotide pairs argues strongly against folding according to structure 1. Based on our FRET data, we conclude that folding predominantly occurs according to Model 3.

Recruitment of the PRC2 Complex by the A Region Previous studies have shown that the PRC2 complex interacts with the Xi [12,14,15,23] and the A region has been proposed to recruit the PRC2 complex through the Ezh2 subunit, which would act as an RNA-binding subunit [21]. We wished to explore further the binding of the PRC2 complex to the A region in the light of our structural data. In particular, we were interested in determining how many A region repeats were required to bind the individual Eed, Ezh2, RbAp46, RbAp48, and Suz12 components of the PRC2 complex. PLoS Biology | www.plosbiology.org

6

January 2010 | Volume 8 | Issue 1 | e1000276


Xist A Region Structure and PRC2 Association

Figure 5. Representation of experimental data on the possible 2-D structure 3 of the Xist A region. In model 3, two stem-loop structures containing four repeats are separated by a small stem-loop corresponding to poly A and poly U sequences. Representation of enzymatic cleavages and chemical modifications are as in Figures 2 and 3. doi:10.1371/journal.pbio.1000276.g005

PLoS Biology | www.plosbiology.org

7

January 2010 | Volume 8 | Issue 1 | e1000276


Xist A Region Structure and PRC2 Association

Figure 6. Probing of the 2-D structures of the entire and 59 half of human A region. Legend as in Figure 1, except that the transcripts correspond to the entire human A repeat region (human A RNA) and the 59 half of the human A region (SLS1H RNA), respectively. doi:10.1371/journal.pbio.1000276.g006

We initiated a proteomic approach based on affinity chromatography purification of complexes formed upon incubation of in vitro transcribed Xist A region RNAs with nuclear extracts, followed by protein identification by mass spectrometry and Western blot analysis. Mouse ES cells are a widely exploited model for the study of XCI initiation, and we reasoned that as Xist RNA acts as an initiator of XCI, proteins which have to interact with this RNA to ensure early Xist functions should already be present in the nuclear extract of ES female mouse cells prior to differentiation. We used a control RNA containing only the three MS2 protein binding sites and tested four RNAs containing different segments of the mouse Xist A region flanked by the three MS2 binding sites at their 39 end (Figure 9A). These RNAs denoted as 1R/MS2, 2R/MS2, 4R/MS2, Aregion/MS2, and HIV/MS2 contained, respectively, repeat 1 without any neighbouring sequence, repeats 3 and 4 and their bordering spacers (positions 401 to 552 in mouse Xist RNA), the SLS1M stem-loop structure, the entire A region, and a fragment of HIV-1 RNA (positions 5338 to 5514 in the BRU RNA) used as a negative control. In order to get an idea of the proteins capable of associating with the entire A region, the proteins bound to purified complexes formed on the Aregion/MS2 RNA were analysed by mass spectrometry. Among numerous proteins detected were protein PTB and components of the PRC2 complex (Ezh2, RbAp46, RbAp48, and Suz12) (Figure S8). We then evaluated by Western PLoS Biology | www.plosbiology.org

blot experiments the relative amounts of each of the PRC2 components in RNPs formed by the various RNAs tested. Whilst Eed, Ezh2, and PTB were detected in complexes formed on RNAs containing two or more repeats, binding of RbAp46 and RbAp48 was detected only when using RNAs with at least four repeats and Suz12 when the entire A region was used (Figure 9C). The control HIV-1 RNA bound none of these proteins (Figure 9B and 9C). To further explore these data, we performed a series of experiments in which fragments of the A region were transcribed in vitro as radio-labelled RNA without MS2 fusion and these RNAs were incubated with mouse ES nuclear extracts. Three distinct RNAs (the complete A region, 4R, and 2R RNAs; Figure 9D and 9E) were used for these experiments. In confirmation of the possible interaction of Eed with an RNA containing only two repeats, trace amounts of 2R RNA were retained on the beads when an anti-Eed antibody was used. Only complexes containing the 4R RNA or the entire A region were retained when anti-Suz12, anti-Ezh2, anti-RbAp46, and antiRbAp48 antibodies were used. These observations confirmed the importance of the corresponding regions for association of these proteins. Clearly, however, higher amounts of the entire A region compared to 4R RNA were bound when the anti-Suz12 and Ezh2 antibodies were used. We conclude that whilst some segments of the A region allow the binding of particular PRC2 components, the entire A region is required for efficient association of the entire complex. 8

January 2010 | Volume 8 | Issue 1 | e1000276


Xist A Region Structure and PRC2 Association

Figure 7. Representation of experimental data on possible structure 3 of human Xist A region. Repeat 5 is located in a short central stem-loop structure flanked by two larger stem-loop structures that each contain four repeats. Representation of the enzymatic cleavages and chemical modifications are as in Figures 2 and 3. Indication of segments in which DMS and/or CMCT modifications were identified is represented as in Figure 3. The free energies of the stem-loop structures were calculated with the M-fold software. doi:10.1371/journal.pbio.1000276.g007

PLoS Biology | www.plosbiology.org

9

January 2010 | Volume 8 | Issue 1 | e1000276


Xist A Region Structure and PRC2 Association

Figure 8. Steady-state fluorescence studies provide additional support to the A region Model 3. In (A, B, and C), the binding sites of oligonucleotides P1 to P6 used in the FRET experiments are shown for the three 2-D structures of the A region corresponding to Models 1, 2, and 3. The identity of the chromophore present in each oligonucleotide (donor Cy3 or acceptor Cy5) is indicated in green and blue, respectively. Cy3- and Cy5-labeled oligonucleotides were purchased from Eurogentec. As illustrated in (D), the emission fluorescence spectra from 530 to 745 nm of the donor oligonucleotide bound alone to the RNA (green curve) were collected, as well as the emission spectra obtained in the presence of the donor and acceptor oligonucleotides (violin curve). No energy transfer between Cy3- and Cy5-labeled oligonucleotides was detected in solution. The FRET efficiency for each pair of oligonucleotides was defined as the decrease in fluorescence of the donor at 564 nm in the presence of the acceptor. Two representative examples of FRET assays (oligonucleotide pairs P2/P4 and P3/P6) are shown in (D). The FRET efficiencies measured for the six pairs of oligonucleotides are provided in (E) (mean values of three independent experiments). Standard deviations (s) are shown. The relative efficiencies of the FRET obtained for each oligonucleotide pairs are schematically represented in (A, B, and C) by lines joining the oligonucleotides. The thickness of the lines reflects the efficiency of the FRET effect. doi:10.1371/journal.pbio.1000276.g008

PLoS Biology | www.plosbiology.org

10

January 2010 | Volume 8 | Issue 1 | e1000276


Xist A Region Structure and PRC2 Association

Figure 9. The PRC2 complex assembles on fragments of the A region containing at least four repeats. (A) Representation of the fusion RNAs used for formation of RNP complexes with ES cell nuclear extracts. Retention of RNA containing three MS2 coat protein binding sites (MS2) on amylose beads was mediated by the MS2/MBP fusion protein [31]. Analysis of the protein content of the RNP complexes formed on the A region/MS2 RNA was achieved by mass spectrometry (Figure S8). (B and C) Western blot assays using antibodies specific for the PTB, Suz12, RbAp46, RbAp48, Eed, and Ezh2 protein were used to evaluate the relative amounts of these proteins in the purified complexes formed with the various RNAs shown in (A). Antibodies were purchased from Santa Cruz (Sc), Abcam (ab), and Calbiochem (cb): anti-Ezh2 (anti-ENX-1 H-80, sc-25383), anti-RbAp46 (ab3535), antiRbAp48 (ab488), anti-Eed (ab4469), anti-Suz12 (ab12073), and anti-PTB (cb NA63). (D and E) Test of the association of radiolabelled fragments of the A

PLoS Biology | www.plosbiology.org

11

January 2010 | Volume 8 | Issue 1 | e1000276


Xist A Region Structure and PRC2 Association

region with components of the PRC2 complex present in nuclear extracts of ES cells. The three RNAs tested are represented in (D). RNAs bound to the G sepharose beads were fractionated by electrophoresis on 7% denaturing gels. Autoradiograms of the gels are shown in (E). Input corresponds to 10% of the material incubated with the beads. doi:10.1371/journal.pbio.1000276.g009

and 245 kcal/mol), explaining why they are proposed by the Mfold software. In addition, these four-repeat structures may be stabilized in cellulo by RNA-protein interactions. Interestingly, protein PTB, which contains 4 RNA recognition motifs (RRMs), which are each able to interact with UCUU(C), UUCUCU, or CUCUCU sequences, showed high affinity for the A region in nuclear extract binding experiments (Figure 9) [25]. As UCUU motifs are present on each side of the large internal loops in the four-repeat structures and in the terminal loop, one might imagine that interactions of the RRMs of a single PTB molecule with these various segments may stabilize the four-repeat structure as suggested by previously proposed models for protein PTB-RNA interaction [26]. In spite that model 3 has the best fit with all the data compared to other models, this does not exclude the possibility of some local dynamic in small areas of the SLS1 and SLS3 structures. More precisely, the instability of a few base-pair interactions can explain the presence of both V1 RNase cleavages and chemical modifications in a limited number of very small segments of the A repeat region.

Discussion The A region of Xist RNA plays an essential role in the X inactivation process. Here, we show that in vitro the repeated elements in the A region of both mouse and human Xist RNAs interact together and with the intervening spacer regions. This leads to the formation of peculiar long irregular stem-loop structures containing four repeats and long U rich terminal and internal loops. Our proteomic analysis suggests that these fourrepeat structures may correspond to functional modules initiating the assembly of the PRC2 complex.

A New Conception of the 2-D Structure of the A Region Until now, both computer [11] and experimental analyses [22] of the possible 2-D structure of the A region of Xist RNA have privileged the individual repeat as the unit of folding. However, the presence of long intervening spacer sequences between the repeats suggests that these spacer sequences may participate in 2-D structure formation, and points to the potential inadequacy of previous models. Our detailed chemical and enzymatic probing of the A region structure in solution involving the design of specific primers for reverse transcriptase extension analysis enabled us to identify for the first time the double-stranded and single-stranded segments making up the A region structure in solution. The data obtained clearly demonstrate that the repeats do not fold on themselves but rather fold one with the other. Chemical and enzymatic probing of an RNA structure in solution often allows the building of a unique 2-D structure model in agreement with the experimental data. Studies on the A region were, however, complicated by the high degree of sequence redundancy. Use of a recently proposed biophysical approach, based on FRET assays [24], helped overcome these difficulties by providing information on the relative distances between the sequences flanking the various repeats. To our knowledge, up to now, this approach has only been used to confirm the 2-D structure model of telomerase RNA [24]. This method, which involves the utilization of oligonucleotides carrying donor and acceptor fluorescent dyes complementary to single-stranded segments in the studied RNA, proved particularly well suited to the study of the A region, since our probing data identified several long single-stranded segments which were able to bind the oligonucleotide probes. Among the possible 2-D structures for the A region, only one, structure 3, showed perfect agreement with the FRET data. Structure 3, which contains two long irregular stem-loop structures, each involving four repeats (four-repeat structure), also shows the best agreement with the chemical and enzymatic data. The two long stem-loop structures are separated by a short stem-loop structure, corresponding to a divergent region between mouse and human Xist RNAs. One repeat in this segment is common to all the sequenced Xist RNAs (Figure S5), except mouse RNA. In the latter, it is replaced by a poly A sequence forming a short stem loop with a poly U sequence. Interestingly, nucleotide sequence conservation in the A region extends to the spacer extremities, which contribute to the possibility of forming the four-repeat structures (Figure S5). Although the presence of large internal and terminal loops decreases the stability of stem-loop structures, the predicted free energies of the two four-repeat structures in both mouse and human RNAs have strongly negative values (between 233 PLoS Biology | www.plosbiology.org

Possible Functional Implication of the Four-Repeat Structure Our adaptation of the affinity purification chromatography, originally developed for purifying spliceosome complexes [27], to complexes formed upon incubation of different fragments of the A region with nuclear extracts prepared from undifferentiated mouse ES cells, coupled with mass spectrometry and Western blot analyses, was powerful. Together with immunoselection assays performed on assembled RNP complexes, it revealed the capability of four components of the PRC2 complex to associate with an RNA corresponding to one of the four-repeat structures formed by the A region. However, our observation that the entire A region is needed for efficient association of the Suz12 protein suggests a putative additional functional role for the entire A region in either binding Suz12 or in stabilising the binding of Suz12 to the four-repeat structure. This is too early to give a convincing molecular explanation of this observation. Further experiments are needed to understand why Suz12 displays different association properties compared to other members of the PRC2 complex. Whilst UV cross-linking of the RNP complexes formed with ES nuclear extract using the entire A region has confirmed the direct binding of PTB to this RNA region, direct binding of components of the PRC2 complex was not detected (unpublished data). Neither Ezh2 nor Eed, which were previously proposed to be recruited by Xist in an A region dependent manner [21], were cross-linked in significant amounts, suggesting that their association with the A region is mediated via association with other nuclear proteins. Therefore, the peculiar SLS1 and SLS3 structures in the A region may be needed to recruit nuclear proteins which have an affinity for components of the PRC2 complex or to reinforce the RNA affinity for these components. Mass spectrometry analysis of RNP complexes formed with the entire A region showed that, in addition to components of the PRC2 complex, a large number of other nuclear proteins can associate with this RNA region. In further studies, it will be important to identify which of these proteins are required for 12

January 2010 | Volume 8 | Issue 1 | e1000276


Xist A Region Structure and PRC2 Association

PRC2 association and which ones bind directly to the A region structure. Our finding that Suz12 requires the entire A region, or more simply more than four repeats for efficient association with the RNA, is in good agreement with the observation of Wutz and colleagues (2002) that the presence of at least 5.5 repeats is needed to initiate inactivation [11]. Additional support for the functional significance of the four-repeat model comes from a reworking of data obtained by Wutz and colleagues, who tested the effect of a series of mutations within the A region on XCI initiation [11]. Our structural studies show that all the variants (XR, XSR, XCR) classed by Wutz et al. as active are able to form the four-repeat structure, whereas the two inactive variants (XS1 and XNX) cannot (Figure S9). Although several data argue in favour of a major role of the four-repeat structure in A repeat activity, we cannot exclude a possible role of alternative structures, for instance in modulating A repeat activity.

using mouse or HeLa cell genomic DNA, and cloned into plasmid pUC18 under the control of a T7 promoter. RNAs were generated by run-off transcription with T7 RNA polymerase as previously described [29]. DNA templates were digested with RNAse-free DNAse I and RNA transcripts were purified on denaturing 3% to 8% polyacrylamide gels.

Enzymatic and Chemical Probing of RNA Secondary Structure RNA 2-D structures in solution were probed as follows [29]: 200 ng of transcripts dissolved at a 80 nM concentration in buffer D (20 mM Hepes-KOH, pH 7.9, 100 mM KCl, 0.2 mM EDTA pH 8.0, 0.5 mM DTT, 0.5 mM PMSF, 20% (vol/vol) glycerol) were renatured by 10 min heating at 65uC, followed by slow cooling at room temperature with the addition of 1 ml of 62.5 mM MgCl2 to a final concentration of 3.25 mM MgCl2. After a 10 min preincubation at room temperature, RNase T1 (0.02 or 0.0375 U/ml) or T2 (0.025 or 0.0375 U/ml) was added under conditions such that it cleaved single-stranded segments. V1 RNase (2.561023 or 561023 U/ml) was used to cleave doublestranded and stacked residues. DMS (1 ml of a 1/4 or 1/8 (V/V) DMS/EtOH solution) was employed to modify single-stranded A and C residues and CMCT (4 or 5 ml of a 180 mg/ml solution) to modify single-stranded U and to a lower extent G residues. Reactions were stopped as described in [29]. Cleavage and modification positions were identified by primer extension [29]. Stable secondary structures having the best fit with experimental data were identified with the Mfold software, version 8.1 [30]. Probing data were introduced as a constraint in the search.

Possible Implication of PRC2 and Other Nuclear Protein Association with the A Region Although it is clear that the A region is essential for the X inactivation process, the precise role and mechanisms involved in the action of the A region remain unclear. Its deletion was shown to block silencing but not the coating of the X chromosome by Xist [11], an observation in agreement with a possible role of the A region in PRC2 recruitment. PRC2 is needed for apposition of some, but not all, of the epigenetic marks which are specific features of silenced chromatin in general and the inactive X in particular (methylation of histone H3 at position 27) [20]. The association of PRC2 with long ncRNAs before transfer of the PRC2 complex to chromatin may be a general mechanism for chromatin silencing processes that depend on long ncRNAs. Both the HOTAIR and Kcnq1ot1 long ncRNAs, which are involved in gene silencing, were recently found to bind the PRC2 complex [19,28]. Recruitment of PRC2 is a relatively early event in X inactivation [14] in agreement with a possible early association of this complex with Xist RNA prior to extensive Xist coating of chromatin. One could imagine that PRC2 is associated with the chromatin upon Xist coating through its interaction with proteins bound to Xist RNA. Alternatively coating of the Xist RNP may facilitate PRC2 transfer to chromatin by interaction of some of the RNP components with proteins of the chromatin structure. Lee and colleagues recently reported the existence of the 1600 nucleotide long RepA RNA carrying the A region at its 59 extremity, which may be expressed prior to expression of the entire Xist RNA and has been reported to recruit the PRC2 complex in a very early step of XCI [21]. Independent confirmation of these findings will be of major importance to the field. Screening of the numerous other proteins that we have found to be capable of association with the entire A region by mass spectrometry for their eventual specific involvement in the recruitment of genes to the X inactivation domain [10] or other early events characterising the onset of X initiation and silencing will be of potential major importance to our understanding of X inactivation.

Steady-State Fluorescence Measurements Fluorescence spectra were recorded at 4uC, with an excitation wavelength of 515 nm and scanning from 500 to 750 nm (excitation and emission bandwidth of 3 nm). The procedure used was derived from [24]. The RNA and Cy3-oligonucleotide were mixed at a 1:1 molar ratio in 160 ml of 150 mM NaCl, 3.25 mM MgCl2, and 15 mM Na citrate (pH 7.0) to a final concentration (0.38 mM) superior to the Kd, incubated at 85uC for 5 min, and slowly cooled at room temperature for 15 min. After 4 h of incubation at 4uC, the yield of oligonucleotide association was determined by electrophoresis in a non-denaturing gel. Fluorescence in the gel was measured with a Typhoon (9410) Healthcare scanner. When a satisfying yield of association was detected, the emission of the Cy3-labeled complex was measured on a flux spectrofluorometer (SAFAS). Ten spectra were averaged. Then, the Cy5-labeled oligonucleotide was added at a 1:1 molar ratio, and incubation carried out at 4uC for 4 h. Ten spectra were recorded and the Fluorescence Resonance Energy Transfer (FRET) for the Cy3–Cy5 pair was calculated taking into account the bound/unbound ratio of Cy3-oligonucleotide. Each FRET experiment was repeated three times using different batches of transcripts.

Purification of RNP by MS2 Selection Affinity The entire mouse A region and several fragments were cloned 39 to a T7 promoter and 59 to the MS2 tag present in plasmid pAdML3 [27,31]. Nuclear extracts from undifferentiated female ES cells (LF2) were prepared according to [32], and dialyzed against buffer D. One hundred pmol of MS2-tagged RNAs were denatured, renatured, as described above, and incubated with a 5-fold molar excess of purified MS2-MBP fusion protein [31] at 4uC for 15 min. The RNA-MS2-MBP complexes formed were incubated

Materials and Methods RNA Preparation DNA fragments coding for the entire A regions of mouse and human Xist RNAs and their subfragments were PCR amplified PLoS Biology | www.plosbiology.org

13

January 2010 | Volume 8 | Issue 1 | e1000276


Xist A Region Structure and PRC2 Association

with amylose beads (40 ml, GE Healthcare) equilibrated in buffer D for 2 h at 4uC. After three washes with 500 ml of buffer D, 1 mg of nuclear extract in 150 ml of buffer D containing 5 mM of yeast tRNAs was added. After 15 min of incubation at 4uC with constant agitation, three successive washes were performed in Buffer D and RNP complexes eluted by incubation with 80 ml of Buffer D containing 10 mM maltose (30 min at 4uC). Half of the eluted RNP complex formed with the entire A region was fractionated by 10% SDS-PAGE for mass spectrometry analyses. For all the purified RNP complexes, 10% of the eluted material was used for Western blot analysis performed according to [33].

taking the first residue of mouse Xist RNA as residue 1. The position of the repeats is indicated by vertical bars on the righthand side of the autoradiograms. Two different analyses of CMCT modifications by primer extension with oligonucleotide 3757 are illustrated in (F). The autoradiogram on the right side of the panel was exposed for a longer time. Found at: doi:10.1371/journal.pbio.1000276.s001 (10.09 MB TIF) Figure S2 Representation of experimental data on a 2-D

structure in which repeats form individual stem-loop structures. Each of the seven repeats as well as the eighth half repeat in the mouse A region were folded into a unique stem-loop structure with an internal loop. T1, T2, and V1 RNase cleavages are represented by arrows surmounted by circles, triangles, and squares, respectively. Nucleotides modified by DMS or CMCT are circled. The colours of circles and arrows indicate the modification and cleavage yields, with red, yellow, and green corresponding, respectively, to strong, medium, and low modification or cleavage. The V1 RNase cleavages and chemical modifications that cannot be explained by this secondary structure model are circled in blue. Found at: doi:10.1371/journal.pbio.1000276.s002 (0.73 MB TIF)

Mass Spectrometry Analysis Each lane of the SDS-PAGE was cut into 2 mm sections, and proteins submitted to in-gel trypsin digestion. Analysis of extracted peptides was performed using nano-LC-MS-MS on a CapLC capillary LC system coupled to a QTOF2 mass spectrometer (Waters) according to standard protocols (Figure S8). The MS/MS data were analyzed using the MASCOT 2.2.0. algorithm (Matrix Science) for search against an in-house generated protein database composed of protein sequences of Rattus and Mus downloaded from UniprotKB http://beta.uniprot.org/ (August 07, 2008) and protein sequences of known contaminant proteins such as porcine trypsin and human keratins concatenated with reversed copies of all sequences. Spectra were searched with a mass tolerance of 0.3 Da for MS and MS/MS data, allowing a maximum of 1 missed cleavage with trypsin and with carbamidomethylation of cysteines, oxidation of methionines, and N-acetyl protein specified as variable modifications. Protein identifications were validated when one peptide had a Mascot ion score above 35. Evaluations were performed using the peptide validation software Scaffold (proteome Software).

Figure S3 Identification of enzymatic cleavage and chemical modification in human A region by primer extension analysis. The A region of human XIST RNA was treated as described for the mouse A region in the legend to Figure 1 of supporting data, except that the primers used for extension analyses were oligonucleotides 4563 (A), 4564 (B), 4622 (C), 4565 (D), and 4242 (E) (Table S1). Found at: doi:10.1371/journal.pbio.1000276.s003 (7.04 MB TIF) Figure S4 Representation of experimental data on the possible structure 1 of human Xist A region. In this model, stem-loop structures involve two successive repeats. The repeats are indicated by red lines and are numbered from 1 to 9. Representation of chemical and enzymatic data is as in Figures 2 and 3. The free energies of each stem-loop structure at 0uC and in 3 M NaCl were calculated with the M-fold software. Found at: doi:10.1371/journal.pbio.1000276.s004 (0.74 MB TIF)

Immunoselection of RNP RNA transcripts were dephosphorylated, 59-end labelled with [c-32P]ATP (3,000 Ci/mmol), purified and quantified according to [34]. About 70 pmol of the RNA were denatured, renatured as described above, and incubated with 30 mg of nuclear extract for 30 min at room temperature with constant agitation. About 40 ml of Protein G-sepharose beads suspension blocked with BSA (2 mg) and coated for 2 h at 4uC with 10 ml of each antibodies were incubated with the RNP complexes for 2 h at 4uC in 300 ml of immunoseletion buffer (150 mM NaCl, 10 mM Tris-HCl, pH 8.0, NP40 0.1%). Beads were washed three times for 10 min at 4uC with 750 ml of immunoselection buffer containing 0.5% NP40. RNAs were phenol extracted, ethanol precipitated, fractionated on 7% polyacrylamide gel, and analysed by autoradiography.

Figure S5 Conservation of sequences surrounding the repeats in vertebrate A regions. Sequence alignment of the mouse, human, Orangutan, baboon, lemur, dog, rabbit, cow, horse, and elephant A regions illustrating the degree of species conservation. Identical nts are indicated in red. Repeats are numbered from 1 to 9 and shown as red rectangles, mouse (gi|37704378|ref|NR_001463.2|), human (gi|340393|gb|M97168.1|), Orangutan (by http://www. ensembl.org/index.html, 292L3-1,185272), baboon (by http://www. ensembl.org/index.html, 157F22-1,190936), lemur (by http://www. ensembl.org/index.html, 176F24-1,134555), dog [by http://genome. ucsc.edu/,(canFam2) assembly canFam1_dna range = chrX:60100000– 60735000)], rabbit (gi|1575009|gb|U50910.1|OCU50910), cow (gi|10181229|gb|AF104906.5|), horse (gi|1575005|gb|U50911.1|), and elephant (BROADE1:scaffold_119260:3220:3899:21 ENSEMBL). Found at: doi:10.1371/journal.pbio.1000276.s005 (0.20 MB DOC)

Supporting Information Figure S1 Identification of enzymatic cleavage and chemical modification of the mouse A region by primer extension. The A region of mouse Xist RNA was in vitro transcribed and renatured as described in Material and Methods, before being submitted to limited digestion with the T1, T2, or V1 RNases under the conditions described in Materials and Methods. Primer extension analyses were performed using oligonucleotides 3760 (A), 3973 (B), 3866 (C), 3758 (D), 3971 (E), or 3757 (F) (Table S1) as primers. The resulting cDNAs were fractionated by electrophoresis on 7% denaturing polyacrylamide gels. Lanes U, G, C, and A correspond to the sequencing ladder obtained with the corresponding primers. Lanes marked by Contr correspond to primer extension analysis of undigested RNA. Nucleotide numbering on the left side of the autoradiograms is calculated PLoS Biology | www.plosbiology.org

The possibility to form four-repeats stemloop (SLS1 and SLS3) structure is conserved in vertebrates. SLS1 and SLS3 in mouse, dog, human, rabbit, and elephant are folded according to the mouse SLS1 structure (Model 3). The name of each species is indicated below the structure. Sequence variations compared to the mouse A sequence are indicated in green. Found at: doi:10.1371/journal.pbio.1000276.s006 (1.21 MB TIF)

Figure S6

14

January 2010 | Volume 8 | Issue 1 | e1000276


Xist A Region Structure and PRC2 Association

Figure S7 Control FRET experiment performed with two oligonucleotides bordering one helix. (A) Schematic presentation of the transcripts used in the control experiment. (B) Fluorescence spectra obtained with the donor P1 oligonucleotide bound to naked 2R/RNA (green curve) and with oligonucleotides P1/P39 bound to the RNA (violin curve). See legend in Figure 8 for details. Found at: doi:10.1371/journal.pbio.1000276.s007 (0.18 MB TIF)

Found at: doi:10.1371/journal.pbio.1000276.s009 (0.54 MB TIF) Table S1 Oligonucleotides used in this study. The name, sequence, and utilization of each oligonucleotide are given. Nucleotide positions of A region are numbered according to the genBank accession no. gi|37704378|ref|NR_001463.2| (Mouse Xist gene) [2] and no. gi|340393|gb|M97168.1| (Human XIST gene) [5]. Restriction sites and fluorescent dyes introduced by the oligonucleotides are indicated. Found at: doi:10.1371/journal.pbio.1000276.s010 (0.05 MB DOC)

Figure S8 Identification of components of the PRC2 complex by mass spectrometry. Peptides that served for each protein identification are summarized in a table with corresponding MS/MS spectra. (A) Identification of Enhancer of zeste homolog 2 (Ezh2). (B) Identification of Polycomb protein Suz12 (Suz12). (C) Identification of Retinoblastoma binding protein 4 (RbAp46). (D) Identification of Retinoblastoma binding protein 7 (RbAp48). (E) Detailed standard protocols for proteomic analyses. Found at: doi:10.1371/journal.pbio.1000276.s008 (3.01 MB DOC)

Acknowledgments Professor R. Lu¨hrmann (Max Planck Institut, Goettingen) is thanked for his generous gift of plasmids pMBP-MS2 and pAdML3 and for help in the use of the MS2-MBP RNP purification approach. Professor I. Motorine and C. Aigueperse (AREMS, Nancy) are acknowledged for their advice in implanting this approach. V. Se´gault (AREMS, Nancy) is thanked for advice concerning the immunopurification assays. S Mazeres (IPBS, Toulouse) is thanked for helping us to define the FRET experimental protocol.

Figure S9 Folding capacities of the synthetic A region

sequences whose activity was evaluated in [11]. Sequence XR corresponds to the positive control, XSR: replacement of GCCCAUCGCGGG by CGGGAUCGGCCC; XCR have small U-rich spacer regions. XS1: deletion of the GG dinucleotides in the second element of each repetition, XNX: replacement of GGGCAUCGGGGC by GCGCAUCGGAGC. Silencing properties of Xist RNA containing these synthetic variants A region are indicated in the right-hand side panel of the Table S1.

Author Contributions The author(s) have made the following declarations about their contributions: Conceived and designed the experiments: SM MB VM SSC AVD PC PA AV CB. Performed the experiments: SM MB LF AS AV. Analyzed the data: SM MB LF AS VM SSC PA AV CB. Contributed reagents/materials/analysis tools: AD. Wrote the paper: SM PA AV CB.

References 15. Mak W, Baxter J, Silva J, Newall AE, Otte AP, et al. (2002) Mitotically stable association of polycomb group proteins eed and enx1 with the inactive x chromosome in trophoblast stem cells. Curr Biol 12: 1016–1020. 16. Kohlmaier A, Savarese F, Lachner M, Martens J, Jenuwein T, et al. (2004) A chromosomal memory triggered by Xist regulates histone methylation in X inactivation. PLoS Biol 2: E171. doi:10.1371/journal.pbio.0020171. 17. Cao R, Wang L, Wang H, Xia L, Erdjument-Bromage H, et al. (2002) Role of histone H3 lysine 27 methylation in Polycomb-group silencing. Science 298: 1039–1043. 18. Kuzmichev A, Nishioka K, Erdjument-Bromage H, Tempst P, Reinberg D (2002) Histone methyltransferase activity associated with a human multiprotein complex containing the Enhancer of Zeste protein. Genes Dev 16: 2893–2905. 19. Rinn JL, Kertesz M, Wang JK, Squazzo SL, Xu X, et al. (2007) Functional demarcation of active and silent chromatin domains in human HOX loci by noncoding RNAs. Cell 129: 1311–1323. 20. Cao R, Zhang Y (2004) The functions of E(Z)/EZH2-mediated methylation of lysine 27 in histone H3. Curr Opin Genet Dev 14: 155–164. 21. Zhao J, Sun BK, Erwin JA, Song JJ, Lee JT (2008) Polycomb proteins targeted by a short repeat RNA to the mouse X chromosome. Science 322: 750–756. 22. Duszczyk MM, Zanier K, Sattler M (2008) A NMR strategy to unambiguously distinguish nucleic acid hairpin and duplex conformations applied to a Xist RNA A-repeat. Nucleic Acids Res 36: 7068–7077. 23. Kalantry S, Mills KC, Yee D, Otte AP, Panning B, et al. (2006) The Polycomb group protein Eed protects the inactive X-chromosome from differentiationinduced reactivation. Nat Cell Biol 8: 195–202. 24. Gavory G, Symmons MF, Krishnan Ghosh Y, Klenerman D, Balasubramanian S (2006) Structural analysis of the catalytic core of human telomerase RNA by FRET and molecular modeling. Biochemistry 45: 13304–13311. 25. Perez I, McAfee JG, Patton JG (1997) Multiple RRMs contribute to RNA binding specificity and affinity for polypyrimidine tract binding protein. Biochemistry 36: 11881–11890. 26. Auweter SD, Allain FH (2008) Structure-function relationships of the polypyrimidine tract binding protein. Cell Mol Life Sci 65: 516–527. 27. Deckert J, Hartmuth K, Boehringer D, Behzadnia N, Will CL, et al. (2006) Protein composition and electron microscopy structure of affinity-purified human spliceosomal B complexes isolated under physiological conditions. Mol Cell Biol 26: 5528–5543. 28. Pandey RR, Mondal T, Mohammad F, Enroth S, Redrup L, et al. (2008) Kcnq1ot1 antisense noncoding RNA mediates lineage-specific transcriptional silencing through chromatin-level regulation. Mol Cell 32: 232–246. 29. Mougin A, Gregoire A, Banroques J, Segault V, Fournier R, et al. (1996) Secondary structure of the yeast Saccharomyces cerevisiae pre-U3A snoRNA and its implication for splicing efficiency. Rna 2: 1079–1093.

1. Lyon MF (1961) Gene action in the X-chromosome of the mouse (Mus musculus L.). Nature 190: 372–373. 2. Brockdorff N, Ashworth A, Kay GF, McCabe VM, Norris DP, et al. (1992) The product of the mouse Xist gene is a 15 kb inactive X-specific transcript containing no conserved ORF and located in the nucleus. Cell 71: 515– 526. 3. Borsani G, Tonlorenzi R, Simmler MC, Dandolo L, Arnaud D, et al. (1991) Characterization of a murine gene expressed from the inactive X chromosome. Nature 351: 325–329. 4. Cohen HR, Panning B (2007) XIST RNA exhibits nuclear retention and exhibits reduced association with the export factor TAP/NXF1. Chromosoma 116: 373–383. 5. Brown CJ, Hendrich BD, Rupert JL, Lafreniere RG, Xing Y, et al. (1992) The human XIST gene: analysis of a 17 kb inactive X-specific RNA that contains conserved repeats and is highly localized within the nucleus. Cell 71: 527– 542. 6. Duret L, Chureau C, Samain S, Weissenbach J, Avner P (2006) The Xist RNA gene evolved in eutherians by pseudogenization of a protein-coding gene. Science 312: 1653–1655. 7. Shevchenko AI, Zakharova IS, Elisaphenko EA, Kolesnikov NN, Whitehead S, et al. (2007) Genes flanking Xist in mouse and human are separated on the X chromosome in American marsupials. Chromosome Res 15: 127–136. 8. Brockdorff N (2002) X-chromosome inactivation: closing in on proteins that bind Xist RNA. Trends Genet 18: 352–358. 9. Hoki Y, Kimura N, Kanbayashi M, Amakawa Y, Ohhata T, et al. (2009) A proximal conserved repeat in the Xist gene is essential as a genomic element for X-inactivation in mouse. Development 136: 139–146. 10. Chaumeil J, Le Baccon P, Wutz A, Heard E (2006) A novel role for Xist RNA in the formation of a repressive nuclear compartment into which genes are recruited when silenced. Genes Dev 20: 2223–2237. 11. Wutz A, Rasmussen TP, Jaenisch R (2002) Chromosomal silencing and localization are mediated by different domains of Xist RNA. Nat Genet 30: 167–174. 12. Silva J, Mak W, Zvetkova I, Appanah R, Nesterova TB, et al. (2003) Establishment of histone h3 methylation on the inactive X chromosome requires transient recruitment of Eed-Enx1 polycomb group complexes. Dev Cell 4: 481–495. 13. Plath K, Talbot D, Hamer KM, Otte AP, Yang TP, et al. (2004) Developmentally regulated alterations in Polycomb repressive complex 1 proteins on the inactive X chromosome. J Cell Biol 167: 1025–1035. 14. Plath K, Fang J, Mlynarczyk-Evans SK, Cao R, Worringer KA, et al. (2003) Role of histone H3 lysine 27 methylation in X inactivation. Science 300: 131–135.

PLoS Biology | www.plosbiology.org

15

January 2010 | Volume 8 | Issue 1 | e1000276


Xist A Region Structure and PRC2 Association

30. Jaeger JA, Turner DH, Zuker M (1989) Improved predictions of secondary structures for RNA. Proc Natl Acad Sci U S A 86: 7706–7710. 31. Zhou Z, Sim J, Griffith J, Reed R (2002) Purification and electron microscopic visualization of functional human spliceosomes. Proc Natl Acad Sci U S A 99: 12203–12207. 32. Dignam JD, Martin PL, Shastry BS, Roeder RG (1983) Eukaryotic gene transcription with purified components. Methods Enzymol 101: 582–598.

PLoS Biology | www.plosbiology.org

33. Jacquenet S, Mereau A, Bilodeau PS, Damier L, Stoltzfus CM, et al. (2001) A second exon splicing silencer within human immunodeficiency virus type 1 tat exon 2 represses splicing of Tat mRNA and binds protein hnRNP H. J Biol Chem 276: 40464–40475. 34. Sambrook J, Fritsch EF, Maniatis T (1989) Molecular cloning. A laboratory manual. New York: Cold Spring Harbor Laboratory Press.

16

January 2010 | Volume 8 | Issue 1 | e1000276


Poised Transcription Factories Prime Silent uPA Gene Prior to Activation Carmelo Ferrai1,2., Sheila Q. Xie2., Paolo Luraghi1¤a, Davide Munari1¤b, Francisco Ramirez3, Miguel R. Branco2¤c, Ana Pombo2*, Massimo P. Crippa1* 1 Laboratory of Molecular Dynamics of the Nucleus, Division of Genetics and Cell Biology, S. Raffaele Scientific Institute, Milan, Italy, 2 Medical Research Council Clinical Sciences Centre, Imperial College School of Medicine, Hammersmith Hospital Campus, London, United Kingdom, 3 South Ruislip, Middlesex, United Kingdom

Abstract The position of genes in the interphase nucleus and their association with functional landmarks correlate with active and/or silent states of expression. Gene activation can induce chromatin looping from chromosome territories (CTs) and is thought to require de novo association with transcription factories. We identify two types of factory: ‘‘poised transcription factories,’’ containing RNA polymerase II phosphorylated on Ser5, but not Ser2, residues, which differ from ‘‘active factories’’ associated with phosphorylation on both residues. Using the urokinase-type plasminogen activator (uPA) gene as a model system, we find that this inducible gene is predominantly associated with poised (S5p+S2p2) factories prior to activation and localized at the CT interior. Shortly after induction, the uPA locus is found associated with active (S5p+S2p+) factories and loops out from its CT. However, the levels of gene association with poised or active transcription factories, before and after activation, are independent of locus positioning relative to its CT. RNA-FISH analyses show that, after activation, the uPA gene is transcribed with the same frequency at each CT position. Unexpectedly, prior to activation, the uPA loci internal to the CT are seldom transcriptionally active, while the smaller number of uPA loci found outside their CT are transcribed as frequently as after induction. The association of inducible genes with poised transcription factories prior to activation is likely to contribute to the rapid and robust induction of gene expression in response to external stimuli, whereas gene positioning at the CT interior may be important to reinforce silencing mechanisms prior to induction. Citation: Ferrai C, Xie SQ, Luraghi P, Munari D, Ramirez F, et al. (2009) Poised Transcription Factories Prime Silent uPA Gene Prior to Activation. PLoS Biol 8(1): e1000270. doi:10.1371/journal.pbio.1000270 Academic Editor: Tom Misteli, National Cancer Institute, United States of America Received July 14, 2009; Accepted November 12, 2009; Published January 5, 2010 Copyright: ß 2009 Ferrai et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: The work in AP’s laboratory was funded by the Medical Research Council (UK). The work in MPC’s laboratory was supported by grants from Istituto Superiore di Sanita`, Italy, Programma Malattie Rare, and Ministero dell’Istruzione, dell’Universita` della Ricerca, Italy, Progetto Oncologia. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. Abbreviations: CAMK2G, calcium calmodulin-dependent protein kinase II gamma; CT, chromosome territory; CTD, carboxy-terminal domain; FISH, fluorescence in situ hybridization; MN, micrococcal nuclease; MN-ChIP, MN-coupled Chromatin ImmunoPrecipitation; RNAP, RNA Polymerase II; TPA, tetradecanoyl phorbol acetate; uPA, urokinase-type Plasminogen Activator; VCL, vinculin. * E-mail: ana.pombo@csc.mrc.ac.uk (AP); crippa.massimo@hsr.it (MPC) ¤a Current address: Institute for Cancer Research and Treatment (IRCC), Torino, Italy ¤b Current address: FIRC Institute of Molecular Oncology (IFOM), Milan, Italy ¤c Current address: The Babraham Institute, Babraham Research Campus, Cambridge, United Kingdom . These authors contributed equally to this work.

(RNAP) enzymes, have been observed only when genes are actively transcribed, but not during the intervening periods of inactivity [2]. Although CTs do not represent general barriers to the transcriptional machinery [10,11] and transcription can occur inside CTs [3,12–14], the large-scale movements of chromatin, observed in response to gene induction, have often been interpreted as favouring gene associations with compartments permissive for transcription [15–17]. However, inducible genes frequently display an active chromatin configuration and are primed by initiation-competent RNAP complexes prior to induction [18–21]. Complex phosphorylation events at the C-terminal domain (CTD) of the largest subunit of RNAP correlate with initiation and elongation steps of the transcription cycle and are crucial for chromatin remodelling and RNA processing [22,23]. The mammalian CTD is composed of 52 repeats of an heptad

Introduction The spatial folding of chromatin within the mammalian cell nucleus, from the level of whole chromosomes down to single genomic regions, is thought to contribute to the expression status of genes [1–3]. Mammalian chromosomes occupy discrete domains called chromosome territories (CTs) and have preferred spatial arrangements within the nuclear landscape in specific cell types, which are conserved through evolution [1–3]. Subchromosomal regions containing inducible genes, such as the MHC type II or Hox gene clusters, relocate outside their CTs upon transcriptional activation or when constitutively expressed [4,5]. Genes can preferentially associate with specific nuclear domains according to their expression status. Most noteworthy, gene associations with the nuclear lamina largely correlate with silencing [6–9], whereas gene associations with transcription factories, discrete clusters containing many RNA polymerase II PLoS Biology | www.plosbiology.org

1

January 2010 | Volume 8 | Issue 1 | e1000270


Poised Transcription Factories

,40 and ,80 kb, respectively (Figure 1A). In HepG2 cells, where the uPA gene is present as a single copy, its transcription can be induced through various stimuli, including treatment with phorbol esters [27]. Tetradecanoyl phorbol acetate (TPA) induces its expression by ,100-fold in HepG2 cells after 3 h of treatment (Figure 1B). The induction of the uPA gene within this short time of activation occurs in all cells of the population, as shown by immunofluorescence detection of uPA protein in single cells (Figure 1C); low levels of uPA protein are detected in a small proportion (7%) of the cell population prior to activation. We first investigated whether transcriptional induction of the uPA gene was associated with large-scale repositioning relative to its CT, using a whole chromosome 10 probe together with a BAC probe containing the uPA locus (Figure 1A). We performed fluorescence in situ hybridization on ultrathin (,150 nm) cryosections (cryoFISH), a method that preserves chromatin structure and organisation of transcription factories. Cells are fixed using improved formaldehyde fixation in comparison with standard 3D-FISH, which is particularly important for the preservation of chromatin structure and RNAP distribution [14,28]. CryoFISH also provides sensitivity of detection and high spatial resolution, especially in the z axis [14,29,30]. We find that, in the inactive state, the uPA locus is preferentially localized at the CT interior (60% loci inside or at the inner-edge, n = 166 loci) and relocates to the exterior upon activation (55% loci at outer-edge or outside, n = 208 loci; x2 test, p,0.0001; Figure 1D), concomitant with the 100-fold induction of mRNA levels determined by qRT-PCR (Figure 1B). Thus, we observed a striking change in the position of the uPA locus relative to its CT upon TPA activation, which correlates with a major increase in mRNA and protein expression across the whole population of cells.

Author Summary The spatial organization of the genome inside the cell nucleus is important in regulating gene expression and in the response to external stimuli. Examples of changing spatial organization are the repositioning of genes outside chromosome territories during the induction of gene expression, and the gathering of active genes at transcription factories (discrete foci enriched in active RNA polymerase). Recent genome-wide mapping of RNA polymerase II has identified its presence at many genes poised for activation, raising the possibility that such genes might associate with poised transcription factories. Using an inducible mammalian gene, urokinase-type plasminogen activator (uPA), and a system in which this gene is poised for expression, we show that uPA associates with poised transcription factories prior to activation. Gene activation induces two independent events: repositioning towards the exterior of its chromosome territory and association with active transcription factories. Surprisingly, genes inside the interior of the chromosome territory prior to activation are less likely to be actively transcribed, suggesting that positioning at the territory interior has a role in gene silencing.

consensus sequence Tyr1-Ser2-Pro3-Thr4-Ser5-Pro6-Ser7, and phosphorylation on Ser5 residues (S5p) is associated with transcription initiation and priming, whereas phosphorylation on Ser2 (S2p) correlates with transcriptional elongation [22,23]. To investigate whether primed genes are associated with discrete RNAP sites enriched in RNAP-S5p and the functional relevance of large-scale gene repositioning in promoting associations with the transcription machinery during gene activation, we investigated the expression levels, epigenetic status, nuclear position, and association with RNAP factories of an inducible gene, the urokinase-type plasminogen activator (uPA or PLAU; GeneID 5328), before and after activation. We use antibodies that specifically detect different phosphorylated forms of RNAP to investigate the association of the inducible uPA gene with transcription factories. Prior to induction, most uPA alleles are positioned inside their CT and extensively associated with RNAP sites marked by S5p. Transcriptional activation leads to looping out of the uPA locus from its CT, and increased association with active transcription factories marked by both S5p and S2p. However, the extent of gene association with factories, before and after activation, is independent of the uPA position relative to its CT. Unexpectedly, we find that the majority of uPA genes which are positioned at the CT interior prior to activation are seldom transcribed, in comparison with the few uPA genes located outside the CT which are active with the same frequency as the fully induced uPA genes.

Induction of the uPA Locus and Its Nuclear Relocation Are Independent of Local Chromatin Decondensation Chromatin repositioning in response to gene activation has often been associated with changes in chromatin structure and degree of condensation [15,17]. To establish whether the largescale relocation of the uPA locus during its transcriptional activation was accompanied by changes between closed and open chromatin conformations, we next assessed the chromatin structure of the uPA gene before and after TPA treatment (Figure 2). Micrococcal nuclease (MN) digestion of crosslinked, sonicated chromatin yields a decreasing nucleosomal ladder before and after TPA activation (Figure 2A and images unpublished, respectively; [31]). Systematic PCR amplification at and around the uPA regulatory regions revealed two populations of genomic DNA fragments that resist processive cleavage at high digestion time points (50 min; see also [31]). At the enhancer, the size of fragments is typically mononucleosomal (,150 bp; fragment E1; Figure 2B, 2C). At the promoter, the protected fragments have larger sizes (.300 bp; fragments P and Px; Figure 2B, 2C). This feature is consistent with the presence of RNAP-containing complexes at the promoter, which was previously observed at the transcriptionally active uPA gene in constitutively expressing cells, but absent after a-amanitin treatment [31]. The same population of larger promoter fragments was also detected in uninduced cells (Figure 2C), showing that the uPA gene displays transcription-associated features before activation. This was supported by an investigation of the epigenetic status of chromatin before and after activation. High resolution, MN-coupled chromatin immunoprecipitation (MN-ChIP; [31]) using antibodies specific for histone modifications associated with close (H3K9me2) or open (H3K4me2, H3K9ac, H3K14ac) chromatin [32] showed the presence of active, but not silent, chromatin marks at the

Results/Discussion Transcriptional Induction of the uPA Locus Promotes Relocation Outside Its CT The uPA gene encodes a serine protease that promotes cell motility, and its overexpression is known to correlate with cancer malignancies and tumor invasion [24–26]. It is a 6.4 kb gene with 10 introns, and its regulatory regions have been extensively characterized [24]. uPA is located on human chromosome 10, separated from upstream and downstream flanking genes CAMK2G (calcium calmodulin-dependent protein kinase II gamma; GeneID 818) and VCL (vinculin; GeneID 7414) by PLoS Biology | www.plosbiology.org

2

January 2010 | Volume 8 | Issue 1 | e1000270


Poised Transcription Factories

Figure 1. Activation of the uPA gene by TPA treatment induces large-scale chromatin repositioning in the nucleus of HepG2 cells. (A) Diagram illustrating the genomic context of the uPA locus and the genomic region detected by the BAC probe (RP11-417O11; ,228 kb) used for FISH experiments. CAMK2G, calcium-calmodulin-dependent protein kinase II gamma; VCL, vinculin. Arrows indicate the 59-39 transcription direction. (B) Kinetics of the transcriptional induction of the uPA gene with TPA. uPA RNA expression was assessed by quantitative RT-PCR after treatment with TPA for different times. Values were normalized to 18S rRNA and expressed relative to uninduced cells. (C) Induction of uPA protein expression with TPA. Indirect immunofluorescence analyses of uPA protein expression (red) before and after TPA treatment (3 h). Nuclei were counterstained with DAPI (blue). The proportion of uPA-expressing cells is indicated (n = 188 and 144 for untreated and treated cells, respectively). Bar: 20 mm. (D) TPA induces large-scale repositioning of the uPA locus (green) to the exterior of its own chromosome 10 territory (CT10; red). The position of the uPA locus relative to CT10 was determined in HepG2 cells, before and after TPA activation for 3 h, by cryoFISH using a whole chromosome 10 paint (red) and the digoxigenin-labelled BAC probe (green) represented in (A). Nucleic acids were counterstained with TOTO-3 (blue). Arrowheads indicate uPA loci. The positions of uPA loci relative to the CT were scored as ‘‘inside,’’ ‘‘inner-edge,’’ ‘‘edge,’’ ‘‘outer-edge,’’ and ‘‘outside’’; error bars represent standard deviations. Bars: 2 mm. doi:10.1371/journal.pbio.1000270.g001

The large-scale chromatin repositioning of the uPA locus relative to its CT (Figure 1D) cannot therefore be explained by changes from closed to open chromatin conformation.

promoter and enhancer of the uPA gene, both before and after TPA induction (Figure 2D). Positive detection of H3K9me2 was confirmed at the imprinted H19 gene (GeneID 283120; Figure S1). Upon activation, the larger promoter fragment (P) is no longer detected with H3K14ac antibodies, although this mark is still present at the smaller promoter fragments (uP and dP), and an enrichment of H3K9ac at the E5 fragment on the enhancer was also detected (Figure 2D). These changes are likely to reflect the presence of different populations of resistant fragments at the uPA regulatory regions upon induction. Taken together, these findings show that the uPA gene adopts an open chromatin state before transcriptional activation, which is maintained after induction. PLoS Biology | www.plosbiology.org

Inactive uPA Genes Are Associated with Poised Transcription Factories Prior to Induction The presence of RNAP phosphorylated on Ser5 residues at promoter regions of silent genes defines them as paused or poised genes [21–23]. To determine whether the inactive uPA gene was associated with RNAP prior to induction, we used MN-ChIP and antibodies specific for phosphorylated forms of RNAP that can discriminate between active and paused/poised RNAP complexes 3

January 2010 | Volume 8 | Issue 1 | e1000270


Poised Transcription Factories

Figure 2. The regulatory regions of the uPA locus display an open chromatin conformation before and after TPA activation. (A) Micrococcal nuclease (MN) digestion progressively cleaves cross-linked, sonicated bulk chromatin to mononucleosomes. Material from the 50 min digestion time-point was used for MN-ChIP experiments. (B) Diagram illustrating the regulatory region of the uPA gene, including the enhancer (E), promoter (P), and the position of amplified genomic fragments at and around the enhancer and promoter regions. Roman numerals indicate exons in the coding region (white and black boxes represent untranslated and translated regions, respectively). (C) Chromatin-associated features of the uPA enhancer and promoter regions as revealed by PCR amplification patterns of MN-digested chromatin DNA before and after transcriptional activation. HepG2 cells were grown 6TPA for 3 h, before chromatin preparation. Enhancer fragment 1 (E1), but not E2 and E3, is resistant to MNase cleavage after 50 min of digestion. Two promoter fragments larger than a single nucleosome (P and Px, 320 bp and 464 bp, respectively) are detected at the same digestion time point in the promoter region, consistent with protection due to additional bound proteins such as transcription factors and RNAP [31]. (D) MN-ChIP experiments identify histone modifications associated with open chromatin at the enhancer and promoter regions before and after TPA activation. Chromatin was cross-linked, sonicated, and treated with MN for 50 min, prior to immunoprecipitation with antibodies against specific histone modifications associated with open (H3K4me2, H3K9ac, H3K14ac) or closed (H3K9me2) chromatin. Control immunoprecipitations were performed with polyclonal anti-uPA receptor (uPAR) antibodies (unrelated). To improve the resolution of MN-ChIP in the promoter region, two smaller, overlapping genomic fragments spanning the entire P fragment (uP and dP, 140 and 199 bp, respectively) were also amplified. doi:10.1371/journal.pbio.1000270.g002

PLoS Biology | www.plosbiology.org

4

January 2010 | Volume 8 | Issue 1 | e1000270


Poised Transcription Factories

[21–23]. Antibody specificity has been extensively characterized previously [21,33] and was confirmed in HepG2 cells using Western blotting and immunofluorescence (Figure S2). MN-ChIP detected the initiating (S5p) but not the elongating (S2p) form of RNAP at the promoter and enhancer regions of the uPA gene prior to induction (Figure 3A), demonstrating that the uPA gene is primed with RNAP before activation. The detection of RNAP-S5p at the enhancer (Figure 3A) can be explained by an interaction with RNAP bound at the promoter, as previously observed in constitutively active uPA genes [31]. Previous studies describing the presence of RNAP at the promoters of paused or poised genes did not investigate a possible association with transcription factories marked specifically by the S5p modification [18,19,21,34]. We asked whether the association of primed uPA loci with RNAP-S5p could be detected at the single cell level and occur within specific nuclear substructures, using immuno-cryoFISH [14]. Using a BAC probe covering a genomic region centred on the uPA gene (Figure 1A) in combination with immunolabelling of the S5p or S2p forms of RNAP, we found that the vast majority of uPA loci were associated with sites containing RNAP-S5p prior to activation (87%69%, n = 165 loci; Figure 3B, 3C). A significantly lower proportion was found associated with RNAP-S2p foci (31%65%, n = 170 loci; x2 test, p,0.0001; Figure 3C). The primed uPA loci are therefore preferentially associated with a subpopulation of RNAP factories that contain RNAP-S5p, but not RNAP-S2p, prior to TPA activation. We call these sites ‘‘poised,’’ or S5p+S2p2, transcription factories. Scoring criteria for gene association with RNAP sites, typically used in the analyses of 3D-FISH results, often rely on proximity criteria that do not involve true physical associations, being sensitive to the limited z axis resolution (.500 nm) of standard confocal microscopes. This is particularly important when analysing highly abundant structures such as transcription factories which can exist at densities of 20/mm3 [35]. Although the use of ultrathin (,150 nm) cryosections mostly detects single factories [29], we were still concerned that the extent of uPA gene association with transcription factories marked by S5p or S2p observed experimentally (Figure 3C) could be due to different abundance of the two modifications and might be explained, at least in part, by random processes. To assess the impact of these two constraints, we generated one simulated uPA signal, for each experimental image, with the same number of pixels as the experimental site, but positioned at random coordinates within the nucleoplasm. Next, we measured the frequency of association of the randomly positioned loci with RNAP-S5p or -S2p sites (Figure S3B, S3C). We found that the association of randomly positioned BAC signals with S5p was 54%68% (Figure S3C; n = 68 loci), a significantly lower number than the experimental value of 87%69% for the uPA locus (Figure 3C; x2 test, p,0.0001). In contrast, the association of randomly positioned signals with S2p was 39%65% (n = 69 loci), similar to the experimental value of 31%65% (x2 test, p = 0.29; Figure S3C). One caveat of these analyses is that the observation of similar levels of association for experimental or simulated loci with transcription factories prior to activation cannot be used to argue that this association is not specific, but simply that it is as low as it would be if loci were positioned randomly. In summary, our results show that the association of a large proportion of uPA loci with poised, S5p+S2p2 factories before activation is specific, although the nuclear environment where the uPA loci are located is not devoid of active, S5p+S2p+ transcription factories, and therefore seems to be permissive for transcription. PLoS Biology | www.plosbiology.org

The uPA Gene Associates with Active Transcription Factories upon Activation To investigate the active state of the uPA gene and the engagement of the locus with active factories following transcriptional induction, we repeated the MN-ChIP and immunocryoFISH analyses for TPA-treated cells (Figure 3D–F). MNChIP showed that the enhancer and the promoter of the uPA gene are associated with the elongating (S2p) form of RNAP either together with RNAP-S5p (at the promoter) or exclusively (at the enhancer; Figure 3D). The absence of RNAP-S5p at the enhancer fragments analysed, in the presence of S2p, suggests that the enhancer may maintain an association with RNAP as it moves through the coding region during elongation, where S5p is known to decrease and S2p to augment [23]. Immuno-cryoFISH after TPA induction (Figure 3E, 3F) showed that the activated uPA locus now becomes associated with RNAP-S2p sites (72%67% loci, n = 183), while maintaining an association with RNAP-S5p (71%68%, n = 140 loci; x2 test, p = 0.98), consistent with the MNChIP results. The approximately 2-fold increase in association with RNAP-S2p from 31% to 72%, before and after TPA treatment, respectively, was highly statistically significant (x2 test, p,0.0001). Evaluation of simulated uPA loci positioned at random coordinates in the same experimental images showed that the increased association of the uPA locus with S2p observed after activation (72%) cannot be explained by random processes, as the frequency of association of simulated loci remained at 38%62% (n = 75 loci; x2 test, p,0.0001; Figure S3B, S3C). An increased association with S2p sites upon activation has also recently been described for the Hoxb1 gene in mouse embryonic stem cells, albeit at lower frequency [36]. Taken together we show that the activation of the uPA gene and large-scale repositioning of the locus relative to its CT coincide with the acquisition of the S2p modification of RNAP without major changes in chromatin structure.

Factories Containing S2p Are Also Marked by the S5p Modification and Are Less Abundant Than Sites Marked by S5p The striking agreement between the number of uPA loci associated with RNAP factories marked by S5p and S2p upon activation and the co-existence of the two RNAP modifications detected by MN-ChIP at the promoter suggest that active factories contain both modifications, as expected from concomitant initiation and elongation events at promoter and coding regions of highly active genes. To further investigate whether poised transcription factories marked by S5p alone are distinct from the active factories marked by S2p, we compared the number of RNAP-S5p and RNAP-S2p sites in HepG2 cells, before and after induction (Figure 4A–C). RNAP sites marked by S5p are significantly more abundant than RNAP sites marked by S2p (28% excess) both before and after activation (Student t test, p,0.0001 in both cases; Figure 4A– C), suggesting that a considerable number of transcription factories adopt the poised state. The excess number of sites containing S5p in the absence of S2p (Figure 4A–C) is consistent with recent reports identifying an abundance of primed genes [34,37–39] marked by RNAP-S5p and not RNAP-S2p [21] in embryonic stem cells or differentiated cells. To investigate to what extent S2p sites are also marked by S5p, we used an antibody-blocking assay ([30,40]; Figure 4D–G), in which sections were first incubated with antibodies against RNAPS5p before incubation with antibodies against RNAP-S2p. Simultaneous incubation with the two antibodies resulted in a 5

January 2010 | Volume 8 | Issue 1 | e1000270


Poised Transcription Factories

Figure 3. Inactive uPA loci associate with poised transcription factories rich in the initiating form of RNAP (S5p), prior to activation. (A, D) MN-ChIP analyses detect the initiating (S5p) form of RNAP bound at the uPA promoter and enhancer before TPA activation (A). Following activation the enhancer is associated with the elongating form of RNAP (S2p), while the promoter is associated with both forms (S5p and S2p) of the enzyme (D). HepG2 cells were grown 6TPA for 3 h, before chromatin preparation and MN-ChIP with antibodies specific for S5p and S2p forms of RNAP. Control (unrelated) antibodies were polyclonal anti-uPA receptor antibodies. (B, E) The position of the uPA locus (red) relative to S5p and S2p sites (green) was determined by immuno-cryoFISH before (B) and after (E) TPA activation for 3 h, using a rhodamine-labelled BAC

PLoS Biology | www.plosbiology.org

6

January 2010 | Volume 8 | Issue 1 | e1000270


Poised Transcription Factories

probe containing the uPA locus and antibodies specific for RNAP phosphorylated on S5 or S2 residues. The association of uPA genes with RNAP (S5p or S2p) was scored as ‘‘associated’’ (signals overlap by at least 1 pixel) or ‘‘separated’’ (signals do not overlap or are adjacent; see Figure S8 for additional examples). Arrowheads indicate the position of uPA loci. Nucleic acids were counterstained with TOTO-3 (blue). Bars: 2 mm. (C, F) Frequency of association of uPA loci with RNAP-S5p or -S2p sites before (C) and after (F) TPA activation. The decrease in association with S5p factories and the increase in association with S2p factories observed after activation were both statistically significant (x2 tests, p = 0.0006 and p,0.0001, respectively). doi:10.1371/journal.pbio.1000270.g003

,65% quenching of the detection of RNAP-S2p (images unpublished), due to a more efficient binding of 4H8, an IgG, in comparison with H5, an IgM. The rationale of antibody blocking experiments is that the binding of the first antibody prevents binding by the second, if the two respective epitopes are located within the distance corresponding to the size of the first bound antibody complex (Ser2 and Ser5 residues are separated by two aminoacids whereas IgGs are large proteins that measure ,9 nm). After pre-incubation with the specific S5p antibody (4H8), the overall intensity of RNAP-S2p sites was significantly reduced throughout the nucleoplasm, except in discrete interchromatin domains (Figure 4E, 4G), as compared to sections not incubated with this antibody (Figure 4D, 4G) or to sections incubated with an unrelated (anti-biotin) antibody (Figure 4F, 4G). A transcriptionally silent population of RNAP-S2p complexes are known to be stably accumulated in splicing speckles [33,41], which are nuclear domains enriched in splicing machinery, polyA+ RNAs, and may be important for post-transcriptional splicing of complex RNAs [42]. The reverse antibody-blocking experiment also confirmed the colocalisation between S2p and S5p sites but produced lower levels of signal depletion (unpublished), as expected due to the larger abundance of S5p sites (Figure 4C). The results from antibody-blocking experiments suggest that most nucleoplasmic S2p sites outside interchromatin clusters also contain the S5p modification, as expected for simultaneous initiation and elongation events on the same gene during cycles of active transcription. Furthermore, S5p-containing structures are in excess of the active factories, demonstrating the presence of discrete sites marked solely by S5p, which represent poised, S5+S2p2 transcription factories.

Analysis of spliced transcripts of CAMK2G and VCL confirms their active state prior to TPA induction and demonstrates similar effects upon activation (unpublished). Interestingly, low levels of uPA primary transcripts sensitive to a-amanitin treatment are detected prior to activation (Figure 5B), consistent with the detection of uPA protein in a small percentage of HepG2 cells before TPA treatment (Figure 1C). The small and disparate changes in the RNA levels of the two genes flanking uPA are in line with a recent investigation of the Hoxb cluster in mouse ES cells, but occur at much shorter genomic distances, in which the Cbx1 gene, 400 kb downstream of the Hoxb cluster, does not change expression levels in spite of increased chromatin repositioning relative to the CT [36]. The behaviour of the uPA flanking genes also agrees with a broader analysis of expression changes across a whole 300 kb region, which undergoes repositioning in response to murine transgenic integration of the b-globin locus-control region, where the expression levels of many genes do not change between the two states [43]. As the levels of primary transcripts at each gene in the locus before and after TPA induction may depend on complex parameters such as the frequency and speed of RNAP elongation, the stability of unprocessed transcripts, and the rate of intron splicing, we investigated whether TPA activation influenced the levels of association of each gene with S5p and S2p sites, using fosmid probes that cover ,42–46 kb of genomic sequence (Figure 5A). Measurements of the diameters of fosmid and BAC signals yielded average values of 353 nm for the uPA fosmid, in comparison with 586 nm for the BAC probe, which demonstrates a significant improvement in spatial resolution. We find that CAMK2G and VCL are extensively associated with both S5p and S2p sites and to a similar extent, irrespective of TPA treatment (association frequency between 62% and 82%; Figure 5C, 5D). Importantly, the relatively small changes in the levels of primary CAMK2G and VCL transcript upon TPA treatment (Figure 5B) are not reflected by detectable changes in their association with either S5p or S2p sites. This suggests that TPA activation does not influence the extent of CAMK2G and VCL association with the transcription machinery, and thus their state of activity is unlikely to have a major role in the relocation of the uPA locus from its territory. Similar analyses of uPA gene association with S5p and S2p sites using a fosmid probe (Figure 5C, 5D) confirms the results obtained with the larger BAC probe (Figure 3). Prior to induction, the gene is extensively associated with S5p sites (79%65%; n = 91; Figure 5C), but not with S2p sites (36%66%; n = 254; x2 test, p,0.0001; Figure 5D). Upon activation, the uPA fosmid probe associates with S5p and S2p sites to a similar extent (i.e., 75%69% and 71%65%, n = 93 and 225, respectively; x2 test, p = 0.48; Figure 5C, 5D) and at the same levels observed with the BAC probe (,70%; Figure 3F). Analyses of simulated fosmid signals (Figure S3D, S3E) support the notion that the association of fosmid signals with S5p sites prior to induction, or with both S5p and S2p after activation, are not explained by random processes (x2 test comparisons between experimental and simulated association pS5p/2TPA = 0.0007, pS5p/+TPA = 0.014, pS2p/+TPA, 0.0001), whereas the association with S2p sites prior to induction can be (pS2p/2TPA = 0.62).

uPA Flanking Genes, CAMK2G and VCL, Are Transcriptionally Active and Associated with Active Transcription Factories Independently of TPA Activation We have shown large scale repositioning of the uPA locus following TPA treatment of HepG2 cells (Figure 1D). The short genomic separation between uPA gene and neighbouring genes, CAMK2G and VCL (40 kb and 80 kb, respectively; Figure 5A), led us to investigate in more detail how the TPA treatment affected the transcriptional state of the three genes, by comparing the levels of unprocessed transcripts and their association with S5p and S2p factories (Figure 5). The levels of primary transcripts, produced before and after TPA treatment, were determined by qRT-PCR with primers that amplify the exon1-intron1 junction, using total RNA extracted from HepG2 cells (Figure 5B); cells were treated in parallel with aamanitin, an inhibitor of RNAP transcription, to discriminate populations of newly made from stable transcripts. Abundant detection of primary transcripts above a-amanitin levels shows that CAMK2G and VCL are actively transcribed prior to TPA activation, whereas uPA primary transcripts are weakly transcribed (Figure 5B; see also Figure 1B, 1C). The levels of CAMK2G and VCL primary transcripts decrease by 2.8-fold and increase by 1.5-fold, respectively, upon TPA treatment, whereas uPA primary transcripts increase by ,11-fold (Mann-Whitney U test, p = 0.05 for the three genes; n = 3 independent replicates). PLoS Biology | www.plosbiology.org

7

January 2010 | Volume 8 | Issue 1 | e1000270


Poised Transcription Factories

Figure 4. Factories containing S2p are also marked by the S5p modification and are less abundant than sites marked by S5p. (A–C) RNAP-S5p sites are more abundant than RNAP-S2p sites. Cryosections (,140 nm thick) from HepG2 cells grown 6TPA for 3 h were indirectly immunolabelled with antibodies specific for RNAP-S5p or -S2p (green), as indicated (A, B). Nuclei were counterstained with TOTO-3 (red). Representative images from TPA-treated cells are shown. RNAP-S5p and -S2p detection was optimised by using the highest concentrations of antibodies that give little detectable background in sections treated with alkaline phosphatase (see Figure S2E, S2F). Bars: 2 mm. Measurement of the number of S5p and S2p sites per unit area in the nucleoplasm (C) reveals a larger population of S5p than S2p sites, both before and after activation (Student t-test, p,0.0001 for both cases; number of nuclear sections analysed were for S5p, n = 45 and 41, and for S2p, n = 38 and 46, respectively, for 2 and +TPA). The decrease in S5p sites after TPA activation is statistically significant (p = 0.012), whereas no statistically significant difference was observed in the number of S2p sites (p = 0.44). (D–F) Most S2p sites also contain the S5p modification. Cryosections were first indirectly immunolabelled in the absence (D) or presence (E) of an antibody against RNAP-S5p (4H8; red) or in the presence of an unrelated anti-biotin antibody (F; red) that detects mitochondria in the cytoplasm (F; arrowheads). After formaldehyde cross-linking to preserve the first immunocomplex, sections

PLoS Biology | www.plosbiology.org

8

January 2010 | Volume 8 | Issue 1 | e1000270


Poised Transcription Factories

were indirectly immunolabelled with an antibody against RNAP-S2p (H5; green). Nucleic acids were counterstained with TOTO-3 (insets), and confocal images collected using the same settings without signal saturation. Bars: 2 mm. Pre-incubation with 4H8 reduces the intensity of S2p signal throughout the nucleoplasm, except at interchromatin regions (E, arrows), in comparison to control samples incubated in the absence of 4H8 (D). Incubation with anti-biotin control antibody before labelling with H5 antibody (F) has no effect on S2p distribution throughout the nucleoplasm. (G) Measurements of average S2p intensity across the nucleoplasm show a ,3-fold decrease in S2p detection after blocking with 4H8. Omission of 4H8 or pre-incubation with unrelated antibodies does not affect the level of the S2p signal in the nucleoplasm. Dotted line indicates the background intensity in the S2p channel measured from sections incubated with all antibodies except H5 (2H5; images unpublished). Number of nuclear profiles was .20 for each sample. doi:10.1371/journal.pbio.1000270.g004

We were surprised to find that the smaller fosmid probe associates with S2p sites prior to induction to the same extent as the larger BAC probe that also covers the active flanking genes, CAMK2G and VCL (36% and 31%, respectively, x2 test, p = 0.26). Simultaneous detection of fosmid and BAC probes in combination with S2p detection (Figure S4B, S4C), confirmed that fosmid and BAC probes associate with S2p sites to a similar extent prior to activation (37%69% and 28%69%, respectively; n = 46; x2 test, p = 0.37; Figure S4C). Unexpectedly, whilst performing these analyses, we observed that a small proportion of uPA loci detected with the fosmid probe (15%63% and 20%68%, n = 47 and 41, respectively for 2 and +TPA; x2 test, p = 0.57; Figure S4D) were looped out from the signal labelled by the BAC probe independently of TPA activation, in a manner reminiscent of loci looping out from their CTs, but on a much smaller genomic length scale ([4,5]; Figure 1D). This mechanism provides a rationale for the independent behaviour of neighbouring genes with respect to their association with specific nuclear landmarks, such as shown here for the association with specific RNAP structures. The fosmid-based analyses allowed us to confirm at higher spatial resolution that the uPA gene is preferentially associated with a subpopulation of RNAP factories, which, prior to induction, contain RNAP-S5p, but not RNAP-S2p. After induction the uPA locus is highly associated with both S5p- and S2p-containing RNAP sites, consistent with its active state.

of uPA gene association with S2p sites can provide a higher estimate, with the caveat that these levels of association may also reflect, in part, an indirect colocalisation of uPA loci with transcription factories associated with the flanking genes. To verify whether detection of newly synthesized uPA transcripts at the uPA locus occurred concomitantly with its association with active, S2p factories, we performed triple labelling experiments in which we simultaneously detected the uPA locus, uPA transcripts, and S2p active factories (Figure 6B). We found that most uPA loci associated with an RNA signal are also associated with S2p sites both before and after TPA treatment (76% and 71%, n = 70 and 75, respectively; x2 test, p = 0.80; Figure 6B), confirming that S2p sites are active sites of transcription. As expected from the higher levels of uPA gene association with S2p than RNA-FISH sites, we find that of all the uPA loci associated with S2p sites (,70%) only half are also associated with uPA transcripts (unpublished; see also [36]). This difference is likely to reflect technical limitations in the detection of transcripts of short genes containing only small introns. Our analyses of uPA gene association with different phosphorylated forms of RNAP and with newly made transcripts show that the vast majority of uPA alleles are associated with poised S5p+S2p2 transcription factories prior to activation. We also identify a small population of alleles transcribed at low levels prior to activation and predominantly associated with sites that are marked by S2p (Figures 5B and 6B). Upon activation, uPA alleles become associated with RNAP sites marked by both S2p and S5p. We consistently find a 2-fold increase in the association of the uPA gene association with S2p sites or uPA transcripts (Figures 3C, 3F, 5D, and 6A), identifying an increased frequency of transcription upon TPA induction.

TPA Induction of Gene Expression Is Associated with an Increase in the Number of Active uPA Alleles In order to investigate whether the extent of uPA gene association with active transcription factories reflects their transcriptional activity, we combined the detection of the uPA locus by DNA-FISH with the visualisation of uPA transcripts by RNA-FISH using five tagged oligoprobes mapping at introns 4, 5, and 10, and exons 8 and 9. We find that the locus is already transcriptionally active prior to activation, with 13%68% of uPA alleles showing an association with RNA-FISH signals (n = 200 loci; Figure 6A), consistent with the detection of uPA protein and transcripts before induction (Figures 1C and 5B, respectively). RNase control experiments confirmed the specificity of the discrete RNA-FISH signals observed within the nucleus (Figure S5). We also show that the frequency of active alleles increases 2-fold, to 27%610%, after TPA treatment (n = 216; x2 test comparison for 2 and +TPA, p = 0.0022; Figure 6A), in agreement with the 2-fold increase observed in the extent of uPA association with RNAP-S2p (Figure 4B). The extent of uPA gene association with RNA signals after activation (27%) is consistently smaller than its association with S2p sites (70%; Figure 5D). However, it must be considered that the efficiency of detection of newly made transcripts at the site of transcription depends on the abundance of RNAP loading at each single gene, the stability of newly made transcripts at the site of synthesis, and the rate of splicing, and therefore is likely to provide a lower estimate for the frequency of gene activity. Intron lengths at the uPA gene are at most ,900 bp, and small introns can be promptly removed within seconds of synthesis. In contrast, the level PLoS Biology | www.plosbiology.org

uPA Gene Association with Poised or Active Factories Occurs Irrespectively of the Locus Position Relative to Its CT We next investigated whether the increased frequency of uPA gene transcription or association with transcription factories were dependent on the locus position relative to its CT, both before and after activation, when the locus is preferentially located at the CT interior and exterior, respectively. We performed triple labelling cryoFISH experiments for chromosome 10, the uPA locus, and transcription factories, before and after activation (Figure 7A–C). Analyses were initially performed with BAC probes, but also confirmed with fosmid probes (Figure S6). As previously observed in double labelling experiments (Figure 1D), the uPA locus was preferentially located at the CT interior and associated with S5p transcription factories before activation (Figure 7B). TPA activation induced the relocation of most uPA loci to the CT exterior and an association with factories marked with both S5p and S2p (Figure 7B). To determine whether the association of the uPA locus with poised or active factories was dependent on its position relative to the CT, we calculated the proportion of uPA locus association with RNAP at each CT position (Figure 7C). For simplicity, the data for the three regions around the edge of the CT (‘‘inneredge,’’ ‘‘edge,’’ and ‘‘outer-edge’’) were pooled into a single region, 9

January 2010 | Volume 8 | Issue 1 | e1000270


Poised Transcription Factories

Figure 5. CAMK2G and VCL genes are transcriptionally active and associated with active transcription factories independently of TPA activation. (A) Diagram depicting the uPA locus and the location of the fosmid probes used for the detection of CAMK2G (covering ,46 kb of genomic sequence), uPA (,44 kb), and VCL (,42 kb) genes in cryoFISH experiments. Arrows indicate the 59-39 transcription direction. (B) Detection of primary transcripts for the CAMK2G, uPA, and VCL genes in HepG2 cells 6TPA, incubated in the presence or absence of the RNAPII inhibitor aamanitin. Total RNA was extracted and the levels of primary transcripts determined by qRT-PCR using primers that amplify the exon1-intron1 junctions. Error bars represent the standard deviation of three independent replicates. (C, D) Frequency of association of CAMK2G, uPA, and VCL with RNAP-S5p (C) or -S2p (D) sites before and after TPA treatment for 3 h. The position of each fosmid signal relative to S5p and S2p sites was determined by immuno-cryoFISH, using rhodamine-labelled fosmid probes and antibodies specific for RNAP phosphorylated on S5 or S2 residues (images unpublished). The association of uPA and its flanking (CAMK2G and VCL) genes with RNAP (S5p or S2p) was scored as ‘‘associated’’ (signals overlap by at least 1 pixel) or ‘‘separated’’ (signals do not overlap or are adjacent as in Figure 3B, 3E). The association with S5p sites is similar for all genes across the locus before and after activation (C; nuPA = 91 and 93; nCAMK2G = 95 and 95; nVCL = 90 and 92, for 2 and +TPA, respectively). CAMK2G and VCL are also associated with S2p sites independently of TPA treatment (D; nCAMK2G = 104 and 121; nVCL = 79 and 86, for 2 and +TPA, respectively), whereas the association of the uPA gene with S2p specifically increases upon TPA induction (D; nuPA = 254 and 225). The increase in association with S2p factories observed after activation was statistically significant (x2 test, p,0.0001). doi:10.1371/journal.pbio.1000270.g005

PLoS Biology | www.plosbiology.org

10

January 2010 | Volume 8 | Issue 1 | e1000270


Poised Transcription Factories

Figure 6. The induction of uPA gene expression is associated with an increase in the number of active alleles. (A) TPA induction of uPA transcripts at the transcription site. uPA gene transcription was detected in HepG2 cells 6TPA by combining RNA and DNA-FISH in the same cryosection. Sections were first hybridized with mixture of five Cy3-labelled fiftymer oligonucleotide probes mapping at introns 4, 5, and 10, and exons 8 and 9 and the signal amplified with fluorescent antibodies. After cross-linking the immunocomplexes detecting uPA RNA, sections were hybridized with the fosmid uPA probe. uPA DNA and RNA signals were scored as ‘‘active’’ (signals overlap or adjacent to each other) or ‘‘inactive’’ (signals do not overlap) to determine the state of activity of the uPA alleles in 6TPA-treated cells. Arrowheads indicate the position of uPA-DNA (green) associated with or separated from uPA-RNA (red) signals. The use of exon probes results in the detection specific cytoplasmic RNA signals. Nucleic acids were counterstained with TOTO-3 (blue). Bars: 2 mm. TPA treatment increases the frequency of uPA allele association with uPA RNA signals from 13% to 27% (n = 200 and 216, respectively; p = 0.002). (B) The frequency of co-association of the uPA gene (blue) with RNAP-S2p (green) and uPA-RNA (red) was determined by triple labelling, using the fosmid uPA probe, antibodies specific for RNAP phosphorylated at S2 residues (H5), and Cy3-labelled oligonucleotide probes. The frequency of association of active uPA alleles with S2p sites was scored as ‘‘associated’’ (signals overlap by at least 1 pixel) or ‘‘separated’’ (signals do not overlap and adjacent signals). Arrowhead indicates the position of uPA-DNA (blue) and uPA-RNA (red) relative to S2p (green). Nucleic acids were counterstained with TOTO-3 (inset). Bar: 2 mm. Most uPA genes that are actively transcribed are also associated with RNAP-S2p, confirming that this RNAP modification marks active factories. doi:10.1371/journal.pbio.1000270.g006

frequencies across the different CT regions before and after TPA activation (n = 151 and 104, respectively; logistic regression analysis, p = 0.10). The increase in association of uPA loci with active factories marked by S2p after TPA is statistically significant (p,0.0001), but the effect of TPA on the level of association is the same across all positions relative to the CT (p = 0.62; Figure 7C). These results show that the uPA gene associates with poised or active transcription factories with similar frequencies across the different CT regions both before and after transcriptional

but analyses of the five regions gave similar results. Surprisingly, we found that the association of the uPA locus with S5p occurred with similar frequency at all locations relative to the CT, independently of TPA activation (n = 90 and 134 loci, respectively; logistic regression analysis, p = 0.20; Figure 7C). This shows that the CT interior is accessible to the transcription machinery and does not preclude the interaction of a primed gene with poised, S5p+S2p2 factories. In the case of S2p, we also found that the uPA locus associates with active transcription factories with similar PLoS Biology | www.plosbiology.org

11

January 2010 | Volume 8 | Issue 1 | e1000270


Poised Transcription Factories

Figure 7. Increased association of uPA locus with active transcription factories upon TPA activation is independent of the large-

PLoS Biology | www.plosbiology.org

12

January 2010 | Volume 8 | Issue 1 | e1000270


Poised Transcription Factories

scale repositioning of the uPA locus relative to its chromosome territory. (A) The position of the uPA locus (uPA-DNA, red) relative to chromosome 10 (CT10, blue) and to S5p and S2p sites (green) was determined in HepG2 cells 6TPA activation for 3 h, by immuno-cryoFISH using a digoxigenin-labelled BAC probe containing the uPA locus, whole chromosome 10 paint, and antibodies specific for RNAP phosphorylated on S5 or S2 residues. Arrowheads indicate the position of uPA loci. Nuclei acids were counterstained with DAPI (images unpublished). Bars: 2 mm. (B) The frequencies of uPA loci which are associated or separated from RNAP-S5p (left hand column) or RNAP-S2p (right hand column) were measured at each position relative to the CT (‘‘inside,’’ ‘‘inner-edge,’’ ‘‘edge,’’ ‘‘outer-edge,’’ and ‘‘outside’’) before and after TPA induction. uPA locus association with S5p and S2p sites was detected in all position analyzed. (C) The proportion of uPA loci that associate with RNAP-S5p (left hand column) or RNAPS2p (right hand column) before and after TPA activation was determined at each CT position. Association of the locus with S5p or S2p sites before and after activation are independent of its position relative to the CT. (D) Combined detection of the chromosome 10 (CT10, blue), the uPA gene (uPA-DNA, green), and uPA RNA (uPA-RNA, red) was performed by RNA- and DNA-FISH using a whole chromosome paint, the uPA fosmid probe, and Cy3-conjugated oligonucleotide probes. Nuclei acids were counterstained with TOTO-3 (insets). Arrowheads indicate the position of uPA-RNA and uPA-DNA signals. Bar: 2 mm. (E) The proportion of uPA genes that associate with uPA RNA before and after TPA activation was calculated at each CT position. Prior to activation, uPA genes at the CT interior are less frequently transcribed than loci outside the territory. After TPA treatment, the frequency of transcriptional events is the same at all CT positions. doi:10.1371/journal.pbio.1000270.g007

activation. Therefore, looping of the uPA locus out of its CT is not required for the association of the uPA gene with active transcription factories. To investigate whether the large-scale chromatin movements that accompany TPA induction of the uPA gene had an effect on the association of the flanking genes, CAMK2G and VCL, with active (S2p) factories, we performed triple labelling cryoFISH experiments for chromosome 10, the CAMK2G, or VCL loci detected with fosmid probes and active (S2p) factories (Figure S7). We find that the association of CAMK2G or VCL with S2p also occurs with similar frequency at all locations relative to the CT, both before and after TPA activation (logistic regression analysis, p = 0.84 and 0.64 for CAMK2G and VCL, respectively; nCAMK2G/2TPA = 108, nCAMK2G/+TPA = 128, nVCL/2TPA = 147, nVCL/+TPA = 134). These results show that the association of the uPA, CAMK2G, and VCL genes with RNAP-S2p occurs independently of CT position. A recent analysis of gene activation induced by the insertion of a strong (ß-globin) enhancer in a gene rich-region also showed no effect on the frequency of locus association with active transcription factories at different positions relative to the CT [43], although this region preferentially localises at the CT edge. In the case of the murine Hoxb locus, a small preferential association of Hoxb1 and flanking genes with active transcription factories is observed outside the CT upon retinoic acid treatment [36]. Different mechanisms of gene regulation may act on different genes and depend on the kinetics of induction over the shorter activation (3 h) of the uPA gene by TPA treatment in comparison with retinoic acid treatment for several days to induce Hox genes. Finally, to investigate whether the CT position of the uPA gene has an influence on its transcriptional activity, we labelled the uPA gene, chromosome 10, and uPA transcripts simultaneously (Figure 7D, 7E). We used the fosmid uPA probe for highest spatial resolution. We find that the uPA gene is transcribed with the same frequency irrespectively of its CT position upon TPA activation (logistic regression analysis, p = 0.74; n = 100). These results differ from the murine Igf2bp1 and Cbx1 genes, flanking the Hoxb cluster, which are also transcriptionally active at all CT positions, independently of Hoxb induction, but are preferentially active outside the CT [36]. Difficulties in the detection of the Hoxb1 transcripts did not allow a similar analysis of allelic transcription upon induction [36], to help establish how general the correlation is between gene positioning outside the CT and transcriptional states. Prior to activation, we unexpectedly found that the largest fraction of uPA loci, which are internal to the CT, are less likely to be transcriptionally active (logistic regression analysis, p = 0.0001; n = 103), whereas the smaller proportion of uPA loci not located at the preferred internal CT position is transcribed at the same frequency as upon TPA induction (Figure 7E). These results suggest that the internal CT positioning has a silencing effect on the primed uPA locus prior to its induction, which helps prevent PLoS Biology | www.plosbiology.org

transcript elongation or interferes with transcript stability, revealing unexpected properties of locus positioning within the nuclear landscape. In summary, our analyses of the uPA gene prior to induction showed that it was (a) preferentially positioned at the interior of its CT; (b) in a poised state, characterized by open chromatin configuration and the presence of RNAP-S5p at regulatory regions; and (c) preferentially associated with poised, S5p+S2p2 transcription factories. Transcriptional activation induces largescale relocation of the gene towards the CT exterior and a preferred association with factories containing both (S5p and S2p) RNAP modifications, as expected in the active state. Although the correlation between looping out of the CT and the change in RNAP configuration suggested that the external position might favour transcriptional activation, triple-labelling experiments showed that the position of the uPA locus relative to its CT and the association with poised or active transcription factories are independent events. RNA-FISH experiments confirm that after TPA induction both external and internal positions of the uPA gene, with respect to its CT, are equally competent for transcription. However, positioning of the uPA locus inside the CT, before activation, may help control the levels of transcription, as uPA genes that are found outside of the CT before TPA treatment are more likely to be transcribed (Figure 7E). Our findings reinforce the idea that the interior of CTs is not repressive for the association of genes with transcription machinery, suggesting that large-scale chromatin movements are unlikely to be necessary for genes to find transcription factories, although they may influence the extent of association for specific subsets of genes. This study expands current models of gene regulation by showing that silent genes can be associated with poised transcription factories and that factory association and gene position relative to the CT can be independent factors. Our results are also compatible with the notion that poised transcription factories represent a sub-population of specialized sites that may allow primed genes to respond rapidly and efficiently to specific activation signals.

Materials and Methods A detailed description of the experimental procedures is given in Text S1.

Cell Culture, RNA Detection, and Western Blotting HepG2 cells were cultured in the absence or presence of 100 ng/ml TPA (Sigma) for the indicated times as previously described [27]. Treatment of HepG2 cells with 1 mM flavopiridol (1 h; Sanofi-Aventis) was used for the inhibition of RNAP-S2p phosphorylation by CDK9, and 75 mg/ml a-amanitin (5 h; Sigma) to inhibit RNAP transcription. For the quantification of mature 13

January 2010 | Volume 8 | Issue 1 | e1000270


Poised Transcription Factories

(,150 nm thick) from HepG2 cells were treated 6 AP prior to immunolabelling with phosphorylation dependent RNAP antibodies. Sections were indirectly immunolabelled with antibodies against RNAP-S5p (4H8; C, E), or RNAP-S2p (H5; D, F). Absence of signal after pre-treatment of cryosections with AP (E, F) shows that 4H8 and H5 antibodies bind specifically to phosphorylated epitopes, and do not detect unphosphorylated RPB1. Nucleic acids were counterstained with TOTO-3 (insets). Bar: 2 mm. Found at: doi:10.1371/journal.pbio.1000270.s002 (8.10 MB TIF)

and unprocessed transcript levels of uPA, CAMK2G, or VCL genes, total RNA was extracted and amplified by RT-PCR. Western blotting was performed using total HepG2 protein extracts and antibodies specific to different RNAP phosphoforms. Experimental details and information about the antibodies used can be found in Text S1.

MN-ChIP and PCR Reactions Chromatin cross-linking, MNase digestion, and immunoprecipitation were performed as described previously [31]. See Text S1 for primer sequences (Table S1), antibodies used, and experimental details.

Figure S3 Frequency of association of simulated uPA loci with RNAP-S5p and RNAP-S2p sites. (A) Diagram of the genomic location of the uPA gene and the regions covered by the BAC (RP11-417O11; ,228 kb) and fosmid (G248P85612C10; ,44 kb) probes used for FISH experiments. Arrows indicate the 5939 transcription direction. (B, D) To analyse the frequency of association of a simulated uPA locus positioned at random coordinates with RNAP-S5p or -S2p sites, we generated a new image containing the original experimental S5p (B, D; green) or S2p (images unpublished) distribution, and the experimental uPA signal (Exp-uPA; blue; arrowheads), and an additional, simulated uPA signal with the same number of pixels, but positioned at random nucleopasmic coordinates (Siml-uPA; red; arrows). This analysis was performed for both BAC (B) and fosmid (D) experiments presented in Figures 3B, 3C, 3E, 3F and 5C, 5D, respectively. Nucleic acids were counterstained with TOTO-3 (insets). Bars: 2 mm. (C, E) Frequency of association of experimental and simulated uPA loci with RNAP-S5p and RNAP-S2p in the same experimental images of HepG2 cells treated 6TPA. Experimental uPA loci associate more frequently with S5p sites than simulated loci, positioned at random nucleoplasmic coordinates, both before and after TPA treatment, for both BAC (C) and fosmid (D) probes. In contrast, the level of association of experimental BAC or fosmid loci with S2p sites is similar to the levels of simulated (random) loci before, but not after, TPA activation. This confirms that the increased association of the uPA gene with S2p sites detected following activation is not due to random processes and is not affected by the size of the probe used. The numbers of simulated sites were nBAC,S5p = 68 and 62; nBAC,S2p = 69 and 75; nfosmid,S5p = 47 and 40; nfosmid,S2p = 50 and 46, for 2 and +TPA, respectively. Found at: doi:10.1371/journal.pbio.1000270.s003 (8.58 MB TIF)

Immunofluorescence, Ultracryosectioning, and cryoFISH uPA protein expression was detected with specific rabbit antiserum antibodies. For high-resolution imaging using cryoFISH, ultrathin cryosections (,140–150 nm thick) were immunolabelled and/or labelled by fluorescence in situ hybridization (FISH) essentially as described before [14]. RNA-FISH was performed using oligonucleotide probes (http://www.singerlab. org/protocols). See Text S1 for information about the antibodies and probes used, and for experimental details.

Microscopy, Quantitative Image Analyses, and Statistics Images were acquired by confocal microscopy and analysed quantitatively. Statistical analyses were performed using x2 test, logistic regression analysis, ANOVA, Student t-test, or MannWhitney U test. See Text S1 for further details.

Supporting Information Figure S1 H3K9me2 histone modification is present at H19 gene promoter, but not the uPA gene promoter. Cross-linked, sonicated chromatin from 6TPA-treated HepG2 cells was digested with MN for 50 min before immunoprecipitation with antibodies that recognize lysine 9 dimethylated histone H3 (H3K9me2), associated with closed chromatin. Control (‘‘unrelated’’) antibodies were polyclonal anti-uPAR antibodies. Immunoprecipitated DNA was amplified using primers spanning the 59 portion of the imprinted H19 gene and the uP fragment of the uPA gene (see scheme in Figure 2D). Found at: doi:10.1371/journal.pbio.1000270.s001 (0.72 MB TIF)

Figure S4 The uPA gene loops out of its chromatin domain. (A) Diagram illustrating the genomic location of the uPA gene and the regions covered by the BAC (RP11-417O11; ,228 kb, blue) and fosmid (G248P85612C10; ,44 kb, red) probes used for FISH experiments. Arrows indicate the 59-39 transcription direction. (B) The association of BAC and fosmid signals (arrowhead) relative to RNAP-S2p sites (green) was determined simultaneously by immuno-cryoFISH before (unpublished image) and after (B) TPA activation for 3 h, using digoxigenin-labelled BAC (blue) and rhodamine-labelled fosmid (red) probes. High magnification images show examples of the coassociation of both BAC- and fosmid-uPA signals with S2p sites (top) or the association of fosmid-uPA signal, but not the BAC signal with S2p sites (bottom). Nucleic acids were counterstained with DAPI (inset). Bar: 2 mm. (C) Frequency of the association of BAC or fosmid signals with S2p sites is similar between probes (x2 test, p = 0.37 and p = 0.81, n = 46 and 40, for 2 and +TPA, respectively). Error bars are standard deviations from two replicate experiments. (D) Fosmid signals (red) can loop out of BAC foci (green). Arrowheads indicate the position of BAC and fosmid signals. Insets show higher magnification images. Nucleic acids were counterstained with TOTO-3 (blue). Bar: 2 mm. Frequency of co-localisation of BAC foci with fosmid-uPA signals show that

Figure S2 Characterization of antibodies against differ-

ent phosphorylated forms of RNAP. (A, B) Reactivity of different RNAP antibodies against hyper- (IIO) and hypophosphorylated (IIA) forms of the largest subunit of RNAP (RPB1) was assessed by Western blotting using total protein extracts from HepG2 cells treated for 1 h in the absence (A) or presence (B) of 1 mM flavopiridol, a specific inhibitor of CDK9, the Ser2 kinase. Both IIO and IIA bands are detected by antibody N-20 (A), raised against the amino-terminus of RPB1, which binds independently of phosphorylation. Antibodies against S5p (4H8 and H14) or S2p (H5) only detect the IIO band (A, B). Treatment of Western blots with alkaline phosphatase (AP; A) prior to immunolabelling reveals the specificity of 4H8, H14, and H5 antibodies for phosphorylated epitopes, and has no effect on the binding of an antibody to the N terminus of RPB1. The specificity of H5 antibodies to the S2p modification is shown by loss of binding in flavopiridol-treated samples (B). Binding of 4H8 and H14 antibodies to IIO band is insensitive to flavopiridol treatment in these conditions, consistent with their specificity for the Ser5 modification (S5p) catalyzed by CDK7, as previously shown (B and [21]). Protein loading was controlled using histone H2B antibodies. (C–F) Cryosections PLoS Biology | www.plosbiology.org

14

January 2010 | Volume 8 | Issue 1 | e1000270


Poised Transcription Factories

15%–20% of uPA alleles detected with the fosmid probe loop out from the BAC signals. The difference between the levels of fosmid looping 6TPA was not statistically significant (x2 test, p = 0.57; n = 47 and 41, for 2 and +TPA, respectively). Found at: doi:10.1371/journal.pbio.1000270.s004 (6.78 MB TIF)

position or TPA activation. The association of CAMK2G or VCL loci with RNAP-S2p were determined relative to the chromosome 10 territory (CT10) in HepG2 cells, before and after TPA activation (3 h), by cryoFISH using a whole chromosome 10 paint and digoxigenin-labelled CAMK2G and VCL fosmid probes. (A) Images represent examples of CAMK2G loci (red, arrowheads) that co-localise with RNAP-S2p (green) inside (left) or outside (right) of CT10 (blue). Nucleic acids were counterstained with DAPI (insets). Bar: 2 mm. (B) The proportion of CAMK2G or VCL loci, which associate with RNAP-S2p, was determined at each CT position (inside, edge, outside), before and after TPA activation. Association of either gene with S2p is independent of their position relative to the CT and to TPA treatment. Found at: doi:10.1371/journal.pbio.1000270.s007 (5.98 MB TIF)

Figure S5 Control experiments for cryo-RNA-FISH. (A–

D) Cryosections (,150 nm thick) of HepG2 cells were hybridised with Cyanine3 labelled uPA oligonucleotide probes before (A) and after (B–D) TPA activation. Inspection of nucleoplasmic regions identifies frequent uPA-RNA signals in TPA-treated cells (B, arrowheads). Pre-incubation of sections with RNase A (C) or omission of oligonucleotide probes (D) abolishes most uPA-RNA signals within the nucleoplasm, demonstrating its specificity. Nucleic acids were counterstained with TOTO-3 (red). Bars: 2 mm. Found at: doi:10.1371/journal.pbio.1000270.s005 (8.38 MB TIF)

Figure S8 Examples of classification criteria for the association of uPA loci with RNAP-S5p and RNAP-S2p sites. The position of the uPA locus (red) with S5p and S2p sites (green) was determined by immuno-cryoFISH using a rhodaminelabelled BAC probe containing the uPA locus and antibodies specific for RNAP phosphorylated at residues S5 or S2 of the CTD. Associated uPA loci co-localise with S5p or S2p sites if signals overlap by at least a single pixel, whereas separated sites do not show overlap of the two signals and include loci that may touch an RNAP site without signal overlap. Found at: doi:10.1371/journal.pbio.1000270.s008 (4.11 MB TIF)

Detection of the uPA gene with a fosmid probe recapitulates the CT looping and position-independent association with S5p and S2p factories observed with the BAC probe. (A) The position of the uPA locus (fosmiduPA, green) relative to the chromosome 10 territory (CT10, red) was determined in HepG2 cells, before and after TPA activation for 3 h, by cryoFISH using a whole chromosome 10 paint and a digoxigenin-labelled uPA fosmid probe. The positions of uPA loci were scored as ‘‘inside,’’ ‘‘inner-edge,’’ ‘‘edge,’’ ‘‘outer-edge,’’ and ‘‘outside’’ relative to its CT as in Figure 1D. Nucleic acids were counterstained with TOTO-3 (blue). Arrowheads indicate uPA loci. Bar: 2 mm. The histogram shows that the fosmid-uPA probe recapitulates the TPA-induced CT looping observed with the BAC probe (Figure 1D), as expected. In the inactive state, the locus is preferentially localized at the CT interior (61% loci inside or at the inner-edge, n = 234 loci), and relocates to the exterior upon activation (58% loci at outer-edge or outside, n = 230 loci; x2 test, p,0.0001). (B) The proportion of uPA loci detected using the fosmid probe, which associate with RNAP-S5p before and after TPA activation, was calculated at each CT position (inside, edge, outside) as for the BAC probe (Figure 7C). Association of the locus with S5p sites before and after activation is independent of its position relative to the CT (logistic regression analysis; p = 0.18 and p = 0.26 before and after TPA, n = 68 and 71, respectively). Overall no effect of TPA treatment on the association of the uPA gene with S5p was detected (logistic regression analysis, p = 0.54). Association of uPA loci detected with the fosmid probe with S2p sites before and after activation is also independent of its position relative to the CT (logistic regression analysis, p = 0.27 and p = 0.79 before and after TPA, n = 113 and 140, respectively). This same analysis also detected an increased association of uPA gene with S2p sites upon activation (logistic regression analysis, p = 0.0004). Found at: doi:10.1371/journal.pbio.1000270.s006 (5.63 MB TIF) Figure S6

Table S1 MN-ChIP primers. List of primers used in MN-ChIP analyses in 59 to 39 orientation. Found at: doi:10.1371/journal.pbio.1000270.s009 (0.06 MB DOC) Text S1 Supplementary information. Found at: doi:10.1371/journal.pbio.1000270.s010 (0.10 MB DOC)

Acknowledgments We thank Stefano Biffo, Marco E. Bianchi, Andre´ Mo¨ller, Julie K. Stock, and Emily Brookes for critically reading the manuscript; Alessandro Marcello and Peter R. Cook for advice on RNA-FISH protocols and probe design; and C. Covino (A.L.E.M.B.I.C., Milano, Italy) for help with confocal microscopy. Flavopiridol was generously provided by Sanofi Aventis and the National Cancer Institute, National Institutes of Health.

Author Contributions The author(s) have made the following declarations about their contributions: Conceived and designed the experiments: CF AP MPC. Performed the experiments: CF SQX PL DM. Analyzed the data: CF SQX FR AP MPC. Contributed reagents/materials/analysis tools: CF SQX MRB AP. Wrote the paper: CF SQX AP MPC. Performed the experiments on uPA expression (mRNA and protein), MN digestion of chromatin, MN-ChIP: CF PL. Performed immuno-cryoFISH experiments and analyses: SQX. Performed statistical analysis: FR AP.

uPA-flanking genes, CAMK2G and VCL, associate with S2p factories independently of CT

Figure S7

References 6. Finlan LE, Sproul D, Thomson I, Boyle S, Kerr E, et al. (2008) Recruitment to the nuclear periphery can alter expression of genes in human cells. PLoS Genet 4: e1000039. doi:10.1371/journal.pgen.1000039. 7. Guelen L, Pagie L, Brasset E, Meuleman W, Faza MB, et al. (2008) Domain organization of human chromosomes revealed by mapping of nuclear lamina interactions. Nature 453: 948–951. 8. Kumaran RI, Spector DL (2008) A genetic locus targeted to the nuclear periphery in living cells maintains its transcriptional competence. J Cell Biol 180: 51–65. 9. Reddy KL, Zullo JM, Bertolino E, Singh H (2008) Transcriptional repression mediated by repositioning of genes to the nuclear lamina. Nature 452: 243–247. 10. Kimura H, Sugaya K, Cook PR (2002) The transcription cycle of RNA polymerase II in living cells. J Cell Biol 159: 777–782.

1. Misteli T (2007) Beyond the sequence: cellular organization of genome function. Cell 128: 787–800. 2. Fraser P, Bickmore W (2007) Nuclear organization of the genome and the potential for gene regulation. Nature 447: 413–417. 3. Pombo A, Branco MR (2007) Functional organisation of the genome during interphase. Curr Opin Genet Dev 17: 415–455. 4. Chambeyron S, Bickmore WA (2004) Chromatin decondensation and nuclear reorganization of the HoxB locus upon induction of transcription. Genes Dev 18: 1119–1130. 5. Volpi EV, Chevret E, Jones T, Vatcheva R, Williamson J, et al. (2000) Largescale chromatin organization of the major histocompatibility complex and other regions of human chromosome 6 and its response to interferon in interphase nuclei. J Cell Sci 113: 1565–1576.

PLoS Biology | www.plosbiology.org

15

January 2010 | Volume 8 | Issue 1 | e1000270


Poised Transcription Factories

11. Becker M, Baumann C, John S, Walker DA, Vigneron M, et al. (2002) Dynamic behavior of transcription factors on a natural promoter in living cells. EMBO Rep 3: 1188–1194. 12. Verschure PJ, van Der Kraan I, Manders EM, van Driel R (1999) Spatial relationship between transcription sites and chromosome territories. J Cell Biol 147: 13–24. 13. Abranches R, Beven AF, Aragon-Alcaide L, Shaw PJ (1998) Transcription sites are not correlated with chromosome territories in wheat nuclei. J Cell Biol 143: 5–12. 14. Branco MR, Pombo A (2006) Intermingling of chromosome territories in interphase suggests role in translocations and transcription-dependent associations. PLoS Biol 4: e138. doi:10.1371/journal.pbio.0040138. 15. Heard E, Bickmore W (2007) The ins and outs of gene regulation and chromosome territory organisation. Curr Opin Cell Biol 19: 311–316. 16. Ragoczy T, Bender MA, Telling A, Byron R, Groudine M (2006) The locus control region is required for association of the murine beta-globin locus with engaged transcription factories during erythroid maturation. Genes Dev 20: 1447–1457. 17. Ragoczy T, Telling A, Sawado T, Groudine M, Kosak ST (2003) A genetic analysis of chromosome territory looping: diverse roles for distal regulatory elements. Chromosome Res 11: 513–525. 18. Boehm AK, Saunders A, Werner J, Lis JT (2003) Transcription factor and polymerase recruitment, modification, and movement on dhsp70 in vivo in the minutes following heat shock. Mol Cell Biol 23: 7628–7637. 19. Spilianakis C, Kretsovali A, Agalioti T, Makatounakis T, Thanos D, et al. (2003) CIITA regulates transcription onset via Ser5-phosphorylation of RNA Pol II. EMBO J 22: 5125–5136. 20. Gomes NP, Bjerke G, Llorente B, Szostek SA, Emerson BM, et al. (2006) Genespecific requirement for P-TEFb activity and RNA polymerase II phosphorylation within the p53 transcriptional program. Genes Dev 20: 601–612. 21. Stock JK, Giadrossi S, Casanova M, Brookes E, Vidal M, et al. (2007) Ring1mediated ubiquitination of H2A restrains poised RNA polymerase II at bivalent genes in mouse ES cells. Nat Cell Biol 9: 1428–1435. 22. Saunders A, Core LJ, Lis JT (2006) Breaking barriers to transcription elongation. Nat Rev Mol Cell Biol 7: 557–567. 23. Phatnani HP, Greenleaf AL (2006) Phosphorylation and functions of the RNA polymerase II CTD. Genes Dev 20: 2922–2936. 24. Crippa MP (2007) Urokinase-type plasminogen activator. Int J Biochem Cell Biol 39: 690–694. 25. Look MP, Foekens JA (1999) Clinical relevance of the urokinase plasminogen activator system in breast cancer. APMIS 107: 150–159. 26. Van Veldhuizen PJ, Sadasivan R, Cherian R, Wyatt A (1996) Urokinase-type plasminogen activator expression in human prostate carcinomas. Am J Med Sci 312: 8–11. 27. Iban˜ez-Tallon I, Caretti G, Blasi F, Crippa MP (1999) In vivo analysis of the state of the human uPA enhancer following stimulation by TPA. Oncogene 18: 2836–2845. 28. Guillot PV, Xie SQ, Hollinshead M, Pombo A (2004) Fixation-induced redistribution of hyperphosphorylated RNA polymerase II in the nucleus of human cells. Exp Cell Res 295: 460–468.

PLoS Biology | www.plosbiology.org

29. Pombo A, Hollinshead M, Cook PR (1999) Bridging the resolution gap: Imaging the same transcription factories in cryosections by light and electron microscopy. J Histochem Cytochem 47: 471–480. 30. Pombo A, Jackson DA, Hollinshead M, Wang Z, Roeder RG, et al. (1999) Regional specialization in human nuclei: visualization of discrete sites of transcription by RNA polymerase III. EMBO J 18: 2241–2253. 31. Ferrai C, Munari D, Luraghi P, Pecciarini L, Cangi M, et al. (2007) A transcription-dependent MNase-resistant fragment of the uPA promoter interacts with the enhancer. J Biol Chem 282: 12537–12546. 32. Nightingale KP, O’Neill LP, Turner BM (2006) Histone modifications: signalling receptors and potential elements of a heritable epigenetic code. Curr Opin Genet Dev 16: 125–136. 33. Xie SQ, Martin S, Guillot PV, Bentley DL, Pombo A (2006) Splicing speckles are not reservoirs of RNA polymerase II, but contain an inactive form, phosphorylated on serine2 residues of the C-terminal domain. Mol Biol Cell 17: 1723–1733. 34. Zeitlinger J, Stark A, Kellis M, Hong JW, Nechaev S, et al. (2007) RNA polymerase stalling at developmental control genes in the Drosophila melanogaster embryo. Nat Genet 39: 1512–1516. 35. Jackson DA, Iborra FJ, Manders EM, Cook PR (1998) Numbers and organization of RNA polymerases, nascent transcripts, and transcription units in HeLa nuclei. Mol Biol Cell 9: 1523–1536. 36. Morey C, Kress C, Bickmore WA (2009) Lack of bystander activation shows that localization exterior to chromosome territories is not sufficient to up-regulate gene expression. Genome Res e-published ahead of print April 23, 2009, doi: 10.1101/gr.089045.108. 37. Guenther MG, Levine SS, Boyer LA, Jaenisch R, Young RA (2007) A chromatin landmark and transcription initiation at most promoters in human cells. Cell 130: 77–88. 38. Muse GW, Gilchrist DA, Nechaev S, Shah R, Parker JS, et al. (2007) RNA polymerase is poised for activation across the genome. Nat Genet 39: 1507–1511. 39. Core LJ, Waterfall JJ, Lis JT (2008) Nascent RNA sequencing reveals widespread pausing and divergent initiation at human promoters. Science 322: 1845–1848. 40. Iborra FJ, Escargueil AE, Kwek KY, Akoulitchev A, Cook PR (2004) Molecular cross-talk between the transcription, translation, and nonsense-mediated decay machineries. J Cell Sci 117: 899–906. 41. Mintz PJ, Patterson SD, Neuwald AF, Spahr CS, Spector DL (1999) Purification and biochemical characterization of interchromatin granule clusters. EMBO J 18: 4308–4320. 42. Johnson C, Primorac D, McKinstry M, McNeil J, Rowe D, et al. (2000) Tracking COL1A1 RNA in osteogenesis imperfecta: splice-defective transcripts initiate transport from the gene but are retained within the SC35 domain. J Cell Biol 150: 417–432. 43. Noordermeer D, Branco MR, Splinter E, Klous P, van Ijcken W, et al. (2008) Transcription and chromatin organization of a housekeeping gene cluster containing an integrated beta-globin locus control region. PLoS Genet 4: e1000016. doi:10.1371/journal.pgen.1000016.

16

January 2010 | Volume 8 | Issue 1 | e1000270


Computational Analysis of Whole-Genome Differential Allelic Expression Data in Human James R. Wagner1, Bing Ge2, Dmitry Pokholok3, Kevin L. Gunderson3, Tomi Pastinen2,4, Mathieu Blanchette1* 1 School of Computer Science, McGill University, Montreal, Quebec, Canada, 2 McGill University and Genome Quebec Innovation Centre, Montreal, Quebec, Canada, 3 Illumina, San Diego, California, United States of America, 4 Department of Human and Medical Genetics, McGill University Health Centre, McGill University, Montreal, Quebec, Canada

Abstract Allelic imbalance (AI) is a phenomenon where the two alleles of a given gene are expressed at different levels in a given cell, either because of epigenetic inactivation of one of the two alleles, or because of genetic variation in regulatory regions. Recently, Bing et al. have described the use of genotyping arrays to assay AI at a high resolution (,750,000 SNPs across the autosomes). In this paper, we investigate computational approaches to analyze this data and identify genomic regions with AI in an unbiased and robust statistical manner. We propose two families of approaches: (i) a statistical approach based on z-score computations, and (ii) a family of machine learning approaches based on Hidden Markov Models. Each method is evaluated using previously published experimental data sets as well as with permutation testing. When applied to whole genome data from 53 HapMap samples, our approaches reveal that allelic imbalance is widespread (most expressed genes show evidence of AI in at least one of our 53 samples) and that most AI regions in a given individual are also found in at least a few other individuals. While many AI regions identified in the genome correspond to known protein-coding transcripts, others overlap with recently discovered long non-coding RNAs. We also observe that genomic regions with AI not only include complete transcripts with consistent differential expression levels, but also more complex patterns of allelic expression such as alternative promoters and alternative 39 end. The approaches developed not only shed light on the incidence and mechanisms of allelic expression, but will also help towards mapping the genetic causes of allelic expression and identify cases where this variation may be linked to diseases. Citation: Wagner JR, Ge B, Pokholok D, Gunderson KL, Pastinen T, et al. (2010) Computational Analysis of Whole-Genome Differential Allelic Expression Data in Human. PLoS Comput Biol 6(7): e1000849. doi:10.1371/journal.pcbi.1000849 Editor: Wyeth W. Wasserman, University of British Columbia, Canada Received December 15, 2009; Accepted June 2, 2010; Published July 8, 2010 Copyright: Ă&#x; 2010 Wagner et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: This work was funded in part by Genome Canada (www.genomecanada.ca), Genome Quebec (www.genomequebec.com), and the National Science and Engineering Research Council of Canada (www.nserc.ca). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. * E-mail: blanchem@mcb.mcgill.ca

allele a may disrupt the binding site, resulting in lower expression. While the lower expression of allele a may be compensated by an increased transcription rate at allele A in heterozygous individuals, this may not be the case for individuals who are homozygous aa, which may result in phenotypic variation. Researchers have tried to identify causative regulatory variants by measuring the total expression (i.e. expression of both copies) of a particular gene across multiple individuals, treating this as a Quantitative Trait Locus (eQTL), and mapping nearby cis-regulatory regions to the gene expression (reviewed in [3]). A key problem with this type of approach is that environmental differences across individuals can affect gene expression, making the mapping problem very challenging. Instead, a focus on the relative expression of two alleles within the same cell has been suggested to factor out environmental sources of variation, allowing for more sensitive and specific detection of epigenetic and genetic phenomena related to local control of gene expression [4]. Combining AI measurements obtained from a set of individuals with genotyping information about these same individuals, one can map cis-regulatory variants [5–8] or detect epigenetic variation in allelic expression [9,10].

Introduction In a diploid cell, each gene is present in two copies. The vast majority of microarray-based or RNA sequencing-based gene expression studies do not distinguish between the two copies and measure the sum of the expression of the two alleles. This hides the fact that the two alleles are not necessarily expressed at equal levels, a phenomenon called allelic imbalance (AI) [1]. The complete shut down of one allele results in monoallelic expression (ME). The most drastic example of ME is X-chromosome inactivation, where, in females, one of the two copies of the X chromosome is inactivated and packaged into heterochromatin [2]. Less drastic is random monoallelic expression, whereby a randomly selected copy of a gene or chromosomal region is silenced by epigenetic mechanisms (e.g. methylation). In contrast, imprinting results in parent-of-origin specific inactivation of the maternal or paternal allele, depending on the locus. While monoallelic expression completely silences one of the two alleles, less drastic allelic expression differences can result from a heterozygous Aa regulatory site. For example, allele A of a transcription factor binding site may allow binding and result in normal expression of the target gene on that chromosome, while PLoS Computational Biology | www.ploscompbiol.org

1

July 2010 | Volume 6 | Issue 7 | e1000849


Whole-Genome Differential Allelic Expression

a real number; (iii) the regions affected are typically quite large, whereas AI can affect a single, short gene, or even only part of a gene. The approaches listed above are thus not easily applicable to the detection of AI in gene expression. An alternate family of statistical approaches called changepoint methods has been proposed for segmenting array CGH data into regions exhibiting consistent signals [28,29]. These non-parametric, model-free approaches have the benefit of segmenting real-numbered data without enforcing discretization. However, they are difficult to generalize to a situation like ours, where signals come from a mixture of discrete (sites with no expression, sites with expression but no imbalance) and continuous (sites with real-valued imbalance) state space. In this paper, we introduce a family of signal processing approaches for the analysis of AI data obtained from genotyping arrays. We consider both statistical approaches (Z-score computation) and machine learning approaches (Hidden Markov Models) to identify transcripts that show AI and to quantify the latter. We introduce a new type of left-to-right HMM for the joint prediction of allelic imbalance in the 53 samples considered. Our algorithms are evaluated using permutation testing and succeed at identifying regions with known AI. Our approaches reveal that more than 25% of transcripts (coding or non-coding) are subject to differential expression between the two alleles and that patterns of AI are varied and complex. The tools and data sets described here will help biologists and geneticists to identify regions of allelic imbalance, understand the mechanisms at play, identify the genetic or epigenetic causative agents, and associate expression polymorphisms with disease susceptibility.

Author Summary Measures of gene expression, and the search for regulatory regions in the genome responsible for differences in levels of gene expression, is one of the key paths of research used to identify disease causing genes, as well as explain differences between healthy individuals. Typically, experiments have measured and compared gene expression in multiple individuals, and used this information to attempt to map regulatory regions responsible. Differences in environment between individuals can, however, cause differences in gene expression unrelated to the underlying regulatory sequence. New genotyping technologies enable the measurement of expression of both copies of a particular gene, at loci that are heterozygous within a particular individual. This will therefore act as an internal control, as environmental factors will continue to affect the expression of both copies of a gene at presumably equal levels, and differences in expression are more likely to be explicable by differences in regulatory regions specific to the two copies of the gene itself. Differences between regulatory regions are expected to lead to differences in expression of the two copies (or the two alleles) of a particular gene, also known as allelic imbalance. We describe a set of signal processing methods for the reliable detection of allelic expression within the genome. Past studies with the goal of detecting AI have typically relied upon panels of SNPs with relatively low density, located in only a subset of transcribed genes of the genome [10–12]. A simple threshold for the ratios of expression of the two alleles at a heterozygous locus is usually established (e.g. 1.5 or 2-fold) and a gene is called as imbalanced based upon whether or not the SNP(s) within it exceed this threshold. Optimal AI profiling in a genomewide manner would require high-density sampling of expressed heterozygous sites in the genome. We recently generated the first large-scale, high-resolution assay of allelic expression [13]. In this study, Illumina genotyping arrays were used to measure differential allelic expression at 755,284 polymorphic sites in lymphoblastoid cell lines (LCL) derived from 53 CEU samples included in the HapMap project [14]. Because of the noise in single point AI measurements made at each heterozygous locus, sophisticated analytical methods are required to make the most out of this data. In this paper, we develop signal processing approaches for the accurate identification and delineation of transcripts with allelic imbalance, either in a single individual at a time, or in a collection of samples. To our knowledge, no hypothesis-free computational approaches have been proposed for the analysis of this type of data. Detection of AI in Ge et al. [13] relied heavily upon RefSeq, Vega, and UCSC gene annotations, and SNPs were first partitioned into windows corresponding to these annotated regions as well as intergenic regions and windows with significant AI were reported. Sophisticated bioinformatics approaches have been developed for a related, but simpler, problem in the past, that of detecting Copy Number Variants (CNV) or Loss Of Heterozygosity (LOH) in cancer cells using array-based Comparative Genomic Hybridization (CGH) [15–18] or genotyping arrays [19–25]. These include the PennCNV program [26] and the QuantiSNP program [27], that use a Hidden Markov Model related to one of the approaches considered here. However, CNV or LOH regions have properties that make them easier to detect than regions of allelic imbalance: (i) the signal, coming from genomic DNA is generally quite strong, whereas gene expression can be very low; (ii) the number of copies of an allele is a small integer, whereas the allelic expression ratio is PLoS Computational Biology | www.ploscompbiol.org

Methods Allelic Imbalance Data Allelic imbalance was assayed using Illumina Infinium Human1M/Human1M-Duo SNP bead microarrays. These arrays, originally designed for genotyping, have probes for approximately 1.1 Million polymorphic sites from HapMap, of which 755284 where used for this study. Each probe estimates the abundance of each of the two possible alleles in the sample. Normally, genomic DNA is hybridized onto the chip and the genotypes are easily inferred from the probe intensities. We have previously described how one can take advantage of this technology to measure allelic expression in a high-resolution, genome-wide manner [13]. Briefly, total RNA is extracted and cDNAs are synthesized based on a protocol on heteronuclear RNA, allowing us to measure unspliced primary transcripts [8]. The cDNA sample is hybridized onto the array and each probe estimates the abundance of each of the two alleles in the sample. In parallel, genomic DNA from the same cell line is hybridized, which provides the basis for normalization of the cDNA hybridization while providing us with the genotype of each sample. Details for the full process of experimentally obtaining the raw imbalance information, as well as the sample information, can be obtained from [13]. Data obtained from technical replicates show that although the total expression level (sum of RNA abundance in both alleles) measured at a given SNP is highly reproducible (R2 = 0.864), single point allelic expression ratios are much more noisy (R2 = 0.632), especially for low expression levels (see 9). This suggests that careful data analysis is required to extract as much information as possible. Let ai ~fai1 ,ai2 g be the set of two alleles present at polymorphic site i in the population, for i~1:::n (the rare cases where three or more alleles exist at the same site are ignored in this study). For notational simplicity, we assume that the genome consists of a 2

July 2010 | Volume 6 | Issue 7 | e1000849


Whole-Genome Differential Allelic Expression

single pair of chromosomes. In reality, the analysis that follows is repeated separately for each autosome. Genotype phasing consists of the decomposition of the genotype of an individual into its two homologous chromosomes. For individual k, let xk ~xk1 ,xk2 ,:::,xkn and yk ~yk1 ,yk2 ,:::,ykn , be these two chromosomes, where xki ,yki [ai . Phasing remains a computationally and statistically challenging problem [30]. In the case of HapMap individuals, phased genotypes are available, although they are not error free. Removal of SNPs not phased in CEU HapMap release R22 resulted in 755284 SNPs which were utilized in our study. k k (ai1 ) and XDNA (ai2 ) be the intensity read outs obtained Let XDNA from the probes interrogating site i when hybridizing the genomic DNA of individual k. If individual k is heterozygous at site i (i.e. k k (ai1 ) and XDNA (ai2 ) to be large. xki =yki ), then we expect both XDNA k When it is homozygous, say for ai1 , (i.e. xi ~yki ~ai1 ), we expect k k (ai1 ) to be large and XDNA (ai2 ) to be small. The genotype of XDNA an individual can thus be deduced from the ratio of the two measurements. k k (ai1 ) and XRNA (ai2 ), the intensity read outs Consider now XRNA obtained from the probes interrogating site i when hybridizing cDNA obtained from whole cell RNA extraction. When heterozygous site i sits in a transcribed region with no allelic imbalance, both k k (ai1 ) and XRNA (ai2 ) will be relatively large. Any difference XRNA between the two may indicate allelic imbalance. Regions that are not transcribed will obtain low values for both alleles. We consider the following pair of observations at each site i:

(exonic and intronic) having roughly 1.3 times the SNP density as intergenic regions (one SNP per 3.5 kb in genic regions, one SNP per 4.5 kb in intergenic regions). Figure 1(a) shows the distribution of E over all genic and intergenic positions. The distribution of expression levels in gene regions is clearly bimodal: a good fraction of genes are not transcribed in LCL, and most but not all intergenic sites are not transcribed. Assuming that 50% of genes and 10% of intergenic sites are expressed, we can deconvolve these distributions to obtain the distribution of E for expressed and nonexpressed regions (Figure 1(b)). For two individuals, experiments were done in triplicates. As seen in Figure S1 (a) and (b), the technical noise in the measurement of both E and R is quite significant. As expected, R values are particularly noisy at low expression levels.

Identification of Transcripts with Allelic Imbalance The main problem addressed in this study is the statistically robust identification of genomic regions with significant and consistent allelic imbalance. We start by noting that the data is too noisy to accurately call imbalance based on each SNP individually (e.g. by simply using on Rki ), especially for regions whose expression level is relatively low. We thus consider approaches that take advantage of the fact that most regions with AI are relatively long and are expected to contain more than one SNP. Four main approaches were designed, implemented and compared. Each method aims to robustly assign a score AI(i) to each SNP i, so that SNPs that belong to transcripts with significant allelic imbalance obtain large (positive or negative) scores. In all our AI detection algorithms, AI is detected without reference to any kind of gene annotation, contrasting with the annotation-driven approach used by Ge et al. [13], which allows us to identify regions of AI whose boundaries does not necessarily correspond to annotated genes. The first three approaches consider data from each sample individually while the last considers data from all samples jointly in order to improve the detection of AI in individual samples. The four approaches considered are first summarized below and then described in details. The code implementing each algorithm is available at http://www.mcb. mcgill.ca/,blanchem/AI/code.zip.

 k  k X (ai1 )zXRNA (ai2 ) Eik ~log RNA k k XDNA (ai1 zXDNA (ai2 ) measures the total transcript abundance, and 0

1 k XRNA (ai1 ) B X k (a ) C i1 B C Rki ~ logB DNA C, k @ XRNA (ai2 ) A k (ai2 ) XDNA which measures the fold imbalance between the expression of the two alleles. Normalization with the DNA sample, which, for heterozygous sites, is known to be balanced, normalizes for probe sensitivity and biases. Values for E and R were collected at 755284 sites. Those sites are not uniformly distributed in the genome, with genic regions

N N

Simple smoothing refers to the approach where the allelic imbalance log-ratio of a SNP is taken as the average of its own log-ratio and that of the m surrounding SNPs on either side. The Z-Score approach involves binning SNPs based on their expression level, assigning each SNP a Z-Score based on

Figure 1. Distribution of E values. (a) Distribution over genic/intergenic regions (b) deconvolutions to expressed/non-expressed regions. doi:10.1371/journal.pcbi.1000849.g001

PLoS Computational Biology | www.ploscompbiol.org

3

July 2010 | Volume 6 | Issue 7 | e1000849


Whole-Genome Differential Allelic Expression

N N

with known AI - see below), but is far from being as accurate as the proposed Z-Score approach, because it leads to bleeding edges at transcript boundaries. We also investigated a version of the ZScore approach where SNPs are not binned by expression level prior to Z-Score computation; this resulted in a small but significant decrease in accuracy, showing that the appropriate modeling of the dependency between the noise in allelic ratio and the total expression level is an important feature of our approach.

its own allelic imbalance ratio, and then determining the ZScores of windows of consecutive SNPs and assigning this score to each SNP within the window. The ergodic HMM approach models the AI data in a given individual as being generated by a Hidden Markov Model whose states correspond to different levels of total expression and allelic ratios. The left-to-right HMM approach is an extension of the ergodic model that allows using the AI data from all individuals in order to assess the frequency of AI at each site, and then use those as site-specific priors on the transition probabilities to predict AI regions separately for each individual, but in the context of the data from other individuals.

Single-Sample Ergodic Hidden Markov Model Approach The linear nature of the data in question lends itself well to a Hidden Markov Model (HMM) in which each data point corresponds to a particular SNP, the hidden states correspond to qualitative descriptions of the allelic imbalance (e.g. positive imbalance, negative imbalance, no imbalance), and emissions correspond to the total expression Ei and the allelic log-ratio Ri observed at site i. We built an HMM consisting of a total of eight hidden states (see Figure 2a). Seven of these states correspond to SNPs take belong to expressed transcripts in the LCL sample in question, with various levels of imbalance: S~fSzzz ,Szz ,Sz ,S0 ,S{ ,S{{ ,S{{{ g, corresponding to strongly positive imbalance (Szzz ), moderately positive imbalance (Szz ), slightly positive imbalance (Sz ), balance (S0 ), slightly negative imbalance (S{ ), moderately negative imbalance (S{{ ) and strongly negative imbalance (S{{{ ). There is also a state (SN ) that corresponds to SNPs located in regions that are predicted not to be transcribed, and for which allelic imbalance is meaningless. The emission probability for each state s[S is modeled with a pair of normal distributions for the E and R values, with parameters (mE,s , s2E,s ), and (mR,s , and s2R,s ) respectively. Whereas both total expression E and allelic imbalance measurements R are observed at heterozygous sites, only the expression is measured at homozygous sites. In the latter case, the imbalance data is left unobserved (i.e. all 8 states are equally likely to have generated the R observation). Homozygous SNPs can thus be included in the model training and predictions, and can help delineating regions of based on expression levels. An HMM with a realistic correspondence to the data can in principle be built with 2Kz2 states, where K§1 represents the number of levels of positive (and negative) imbalance that the model represents. Larger values of K should in principle be favorable as they allow a finer discretization of allelic ratios. Models with K[f1,2,3,4g were trained and the false discovery rate measured and compared (see section 0). It was found that K~3 performed better than K~1 and K~2, and similarly to K~4 (Figure S2), so this value was used for both the ergodic and left-toright models. Certain parameters of the HMM are trained using the BaumWelch algorithm, while others are fixed. For SN , the emission probability distribution for E is modeled non-parametrically by the histogram of Figure 1(b) (black curve) whereas all expressing states share the same total expression distribution from Figure 1(b) (red curve). These emission probability distributions are kept constant during the training procedure. The Baum-Welch algorithm [31] is used to find maximum likelihood estimators for mR,s and s2R,s , for s[S, as well as all transition probabilities and the initial state probability. The Baum-Welch algorithm is an expectation-maximization (EM) [32] approach that alternates between the Expectation step (or E-step), in which the posterior probability over states is computed for each site using the Forward-Backward algorithm, and the Maximization step (or MStep) where the parameters of the emission and transition probability distributions are adjusted to best reflect the observed data given these posterior probabilities. Formulas for updating the

Simple Smoothing Approach Consider heterozygous site i and define window W(i,m) to be the set consisting of m heterozygous sites to the left of i, m heterozygous sites to the right of m, andP i itself. The simple smoothing approach estimates AI smoothing (i)~ j[W (i,m) Rj =(2mz1). Any site i with DAI smoothing (i)Dwtsmoothing would then be reported as having imbalance, for some appropriate threshold tsmoothing . Based on False Discovery Rate assessment (described below), a value of m~4 was determined to be the optimal window size and was used for all results reported.

Z-Score Approach At sites with no allelic imbalance, the value of Ri is modeled adequately using a normal distribution centered at 0. However, the variance is inversely correlated with the total expression Ei , as AI is difficult to estimate when the total expression is low (see Figure S1b). The range of possible values of E are subdivided into 100 bins of equal size and the mean mb and variance s2b of R values were determined for SNPs belonging to every expression level bin b. A site-specific Z-Score Z(i) is assigned to heterozygous site i as Z(i)~(Ri {mbin(Ei ) )=sbin(Ei ) . Homozygous sites, being uninformative with respect to allelic ratios, are excluded from the analysis. Consider now a collection of w consecutive heterozygous (ignoring possibly intervening homozygous sites) SNPs i1 ,i2 ,:::,iw . We define P Z(ik ) k~1:::w the regional Z-score as Z(i1 ,i2 ,:::,iw )~ pffiffiffiffi . Assuming the w normality of noise in Ri measurements, Z(i1 ,i2 ,:::,iw ) follows a Normal(0,1) distribution under the null hypothesis of absence of allelic imbalance. Regional Z-Scores are first computed for every possible window of w~1:::50 heterozygous sites. The region with the highest regional Z-score (in absolute value), Zmax is selected first and we set AI zscore (i)~Z max for all sites heterozygous i within the region. This region is then masked out and the next highest scoring nonoverlapping window is selected. The process is repeated until all heterozygous sites have a Z-Score assigned. We note that because the AI zscore (i) is obtained based on the best window that contains site i, there is an complex issue of multiple hypothesis testing that makes that this measure will not follow a Normal(0,1) distribution under the null hypothesis (i.e. absence of AI). In consequence, one cannot easily translate AI zscore (i) into a p-value. We also considered a variant of the Z-Score approach where each SNP is assigned the Z-Score of the fixed-size window centered around it. This approach, which can be seen as an improved version of our simple smoothing approach, indeed improves on the latter (based on permutation testing and comparison to transcripts PLoS Computational Biology | www.ploscompbiol.org

4

July 2010 | Volume 6 | Issue 7 | e1000849


Whole-Genome Differential Allelic Expression

Figure 2. Architecture of the two Hidden Markov model used in this study. (a) Ergodic HMM architecture. HistoExp and HistoNoExp refer to the distributions depicted in Figure 1(b). For readability, states Szzz and S{{{ are not shown. (b) Multi-sample left-to-right HMM architecture. States Szzz , Szz , S{{ , and S{{{ are not shown for clarity. Only transition probabilities are trained. All copies of a given state have the same emission probability distribution, described on their left. doi:10.1371/journal.pcbi.1000849.g002

base pairs will be T l , which is efficiently computed using the eigenvalue decomposition of T. To ensure that our training procedure was not subject to overfitting, we used 2-fold cross validation (dividing the 53 samples into one 26-sample data set and one 27-samples data set) and trained our 8-state ergodic HMM separately on each half the samples. The parameters and transition probabilities obtained were nearly identical, and so were the FDR estimates obtained by running each HMM on the complementary data set, indicating that overfitting is not an issue.

emission probability parameters and transition probabilities are adapted straightforwardly from Mitchell [33]. We considered training one HMM per individual (which would allow the flexibility to model inter-experiment variation in noise, for example), or to train a single HMM based on the data from all individuals (which would have the benefit of being based on more data). The latter option produced slightly better results and this is the strategy we used for the rest of the study. We also considered filtering out sites with low total expression, as their allelic expression ratio may be less reliable. However, slightly better results were obtained without any filtering (allowing non-expressed SNPs to naturally be classified as belonging to state SN ). Training on the whole data set took less than Baum-Welch 20 iterations and 3 hours to converge on a standard desktop computer (convergence is defined as two consecutive iterations where no parameter or transition probability changed by more than 10{ 5 or 1% of their value). Restarts from different initial values converged to nearly the same values. The Viterbi algorithm [34] can then be used to identify, in each individual, predicted regions of different levels of positive or negative imbalance. The Forward-Backward algorithm [35] yields an estimate of the posterior probability of each state at each site. In the latter case, a useful summary score for each site is the posterior expected allelic Pexpression log-ratio, which we use as AI predictor: AI ergodic (i)~ s[S Pr½Si ~sDE1::n ,R1::n :ms . Until now we have assumed homogenous transition probabilities, regardless of the distance in base pairs between consecutive SNPs along the chromosome. However, a more accurate model would factor in the distance between neighboring SNPs, to increase the probability of self-loops (i.e. staying in the same state) when the two sites are nearby but increase the probability of state change for two distant sites. Such an approach has been used previously in HMMs designed to detect CNVs [27]. We obtained a unit transition probability matrix T as the d-th root of the transition matrix obtained via Baum-Welch training of the homogeneous model, where d is the average distance (in base pairs) between two consecutive SNPs in our data. Then, the transition probability matrix used for a pair of sites separated by l PLoS Computational Biology | www.ploscompbiol.org

Multi-Sample Left-to-Right HMM Approach The previous HMM is called ergodic because it models an ergodic, homogeneous Markov chain over the state space (i.e. the set of transition probabilities is independent of the position along the genome). One limitation of this HMM is that it does not take full advantage of the fact that data exists for multiple individuals and that, while not all individuals are expected to have AI in exactly the same regions, one does expect AI hotspots where a significant fraction of the individuals would have imbalance. That would be the case, for example, for genes where one allele is commonly or always silenced via epigenetic mechanisms, or when AI is due to a common regulatory variant. The approach proposed in this section aims at predicting AI regions separately in each individual, while taking into consideration the data observed in all individuals. In doing so, we still want to be able to identify AI regions that are unique to a given individual, but are hoping to improve the detection of regions with common AI. For example, AI regions containing only a few SNPs, or those where the imbalance is only moderate, may be missed when present in a single individual, but may be detectable if present in a large fraction of the population. In addition, we may be able to detect boundaries of AI regions with more accuracy when they are shared among individuals. The approach utilized to address this is termed the left-to-right HMM [35] (see Figure 2 (b)), similar to profile HMMs [36]. Each site has its own copy of the set of states and transitions can only occur between states associated with neighboring sites, from left to 5

July 2010 | Volume 6 | Issue 7 | e1000849


Whole-Genome Differential Allelic Expression

right. Each copy of a given state shares the same emission probability distributions that are modeled the same way as with the ergodic HMM. However, transition probabilities will vary across positions, making the model non-homogeneous (in contrast to our ergodic HMM approach). This configuration allows for greater fine tuning at the level of each individual SNP or region, though at the cost of a substantially larger set of transition probabilities to be learned. The training of our left-to-right HMM is a two stage process. In the first stage, emission probabilities, transition probabilities, and start probabilities are estimated for the ergodic version of the HMM using the Baum-Welch algorithm described above, using all available individuals. The parameters of the emission probabilities of the states in the left-to-right HMM will be set to those obtained on the ergodic training and will not be reestimated. The obtained ergodic non-homogeneous distancecorrected transition probabilities will be used as prior for those of the left-to-right HMM. In the second stage, we now switch to learning the transition probabilities of the left-to-right HMM. We assume that the data set from each individual is the result of an independent run of the HMM: Pr((E 1 ,R1 ),(E 2 ,R2 ),:::,(E k ,Rk )DHMM)~Pi~1:::k Pr(E i ,Ri DHMM), and we seek to identify the set of transition probabilities of the left-toright HMM that maximizes this joint likelihood. Consider a site i that is not imbalanced in any individual but where site iz1 is positively imbalanced in a large fraction of the individuals. The maximum likelihood estimator for the transition from state S0 (i) to state Sz (iz1) will be higher than at other positions where few individual enter an imbalanced region. Now consider an individual where there is only weak evidence of AI starting at position iz1. When using an ergodic HMM for our predictions, the weak AI region will probably not be detected. However, in the left-to-right HMM, with the increased transition probability, the AI path becomes more likely, so provided that there is sufficient imbalance, the most likely path may now to go through one of the imbalanced state. Estimating transition probabilities between two sites separated by l base pairs is done using a simple modification to the standard Baum-Welch algorithm, where the update rule for transitions is: P j j : l j~1:::k (Pr(Si ~a,Siz1 ~b))zW T (a,b) where T l t’i,iz1 (a,b)~ P j (Pr(S ~a))zW i j~1:::k is the l-th power of the unit transition probability obtained previously and W indicates the pseudocount weight described in the following paragraph. The regularization obtained by using the ergodic transition probability as prior reduces the risks of overfitting while improving the convergence of the training procedure. In practice, based upon permutation tests and resulting FDR scores, a parameter of W ~1 was determined to be optimal (data not shown). Once the left-to-right HMM is trained using the data from all 53 individuals (which took 161 Baum-Welch iterations - less than 4 hours on a standard desktop computer), the standard Viterbi or Forward-Backbward algorithms are used to identify AI regions separately for each individual. As with the case of the ergodic HMM, we use the posterior expected allelic expression log-ratio AI LtoR (i) to summarize AI evidence at SNP i. Overfitting is a possible issue with our left-to-right HMM, as the number of parameters estimated is much larger than for the ergodic HMM. We performed 5-fold cross-validation, training on 4/5 of the data and predicting on 1/5. Thanks to our regularization procedure, the predictions obtained were very similar to those obtained by training and testing on the full data set, with only a marginal decrease in FDR. PLoS Computational Biology | www.ploscompbiol.org

Cross-Hybridization Upon study of some of the regions where AI was predicted in most or all individuals but where not known imprinted regions existed, we found that nearly half were a likely artifact of crosshybridization. All these suspicious regions were the results of a segmental duplication, where a fragment of a gene was duplicated. Because the fragments still matches the genic region, sites within them will appear to be expressed (as they match the transcript of the paralogous region), and polymorphisms will cause mismatches between the probe and the true transcript, which will result in apparent AI. We thus used the human Blastz self-alignment from the UCSC Genome Browser [37,38] to filter out regions corresponding to recent duplications. A possible alternate approach would consist of using the results of the genomic DNA hybridization to identify probes that match more that one location in the genome, with the possible added benefit of detecting DNA possible copy-number variation.

False-Discovery Rate Estimation Due to the relatively small number of ‘‘gold standard’’ regions known to exhibit AI, the best available option for comparison of the various models is through permutation tests. The goal was to preserve some of the structure of the genome such that only SNPs with approximately equal expression levels and heterozygosity would be swapped, i.e., the only factor that is swapped freely is that of the allelic imbalance ratio. Permuted data sets were generated as follows. Sites were partitioned into five levels based on the number of individuals in which they are heterozygous. Five bins were also assigned based on the average level of expression seen across all individuals. Each SNP was then finally assigned to one of 25 bins, with one bin for each of the possible combinations of heterozygosity frequency and expression levels. Sites were randomly permuted within each bin, preserving the correspondence between sites in different individuals (in the case of the leftto-right HMM, the first stage of training of global HMM parameters was first done on non-permuted data, and then the second stage of model training was done on permuted data). Preserving expression levels and heterozygosity is important to create permuted data sets that are as realistic as possible, in particular with respect to the fact that expressed sites are found in contiguous genomic regions rather than dispersed randomly in the genome. Each of the prediction methods described produces one AI score per site and per individual. For each method M, the number of regions of consecutive SNPs exceeding a given score threshold t, Nreal (t,M) and Nperm (t,M) was determined in the real and permuted data, resulting in a False-Discovery Rate of Nperm (t,M) . FDR(t,M)~ Nreal (t,M)

Results Each of our four approaches was applied to the data set and the AI predictions for each individual are available at http://www. mcb.mcgill.ca/,blanchem/AI/AIPredictions.zip.

Illustrative Case Studies We use two examples to highlight the features of the data and the methods developed. Figure 3 gives a sample of the raw data and predictions made by each method in the BLK locus. BLK is a gene that has previously been described as allelically imbalanced in LCL [13]. Interestingly, in this individual, two other neighboring genes have strong allelic imbalance, with FAM167A showing expression on the opposite allele compared to BLK and 6

July 2010 | Volume 6 | Issue 7 | e1000849


Whole-Genome Differential Allelic Expression

Figure 3. Raw data and predictions. Example of genomic region with allelic imbalance. From top to bottom: Raw allelic log-ratio; Simple smoothing predictions; Z-score predictions; Ergodic 8-state predictions (expected allele log-ratio); Left-to-right 8-state HMM predictions (expected allele log-ratio); Raw total expression; UCSC known genes track. Data shown is for HapMap individual NA11840. Note: Allelic ratios at homozygous sites are not shown. doi:10.1371/journal.pcbi.1000849.g003

annotated gene, may reflect the presence of alternative alleledependent promoters. They may also represent completely novel unannotated transcripts. Another frequently observed pattern is the presence of AI within annotated transcripts, near the 59 or 39 end (e.g. the 39 end of the ITIH5 gene). Finally, AI regions often encompass one or more complete genes (e.g. GATA3 and NM_207423), possibly because of epigenetic modification of one of the two alleles. We note based on analysis done in [13] that SFTMBT2 and ITIH5 show evidence of heritable allelic expression, whereas GATA3 does not show correlation with common genetic variants and could represent epigenetic modification of expression in LCLs.

GATA4 also obtaining strong an consistent signals. Although in this example the boundaries of allelic expression domains align nicely with known gene boundaries, this is not the case in general. As is obvious from the figure, the raw expression and allelic ratio data are quite noisy. The simple smoothing approach succeeds at identifying the main regions of allelic imbalance but does so much less reliably and precisely than the other three approaches. Notice that this individual has no heterozygous sites in the 59 end of FAM167A. This results in different behaviors for each method. The ergodic approach assigns gradually decreasing expected allelic log-ratios in that region, while the Z-Score approach only predicts imbalance in the 39 end of the gene. However, the left-to-right HMM has the benefit of considering data from other individuals, which have some heterozygous sites in the 59 region of the gene, which allows it to predict strong and consistent negative allelic logratios over the whole gene, and a sharp transition entering the BLK transcript. A similar phenomenon is observed for GATA4. Figure 4 shows the set of predictions made by the Viterbi algorithm using the left-to-right HMM on the extended GATA3 locus, in all 53 samples. The region exhibits a large diversity of patterns of AI. In some cases, the region of AI closely matches an annotated gene (e.g. SFTMBT2 in several individuals). Often, AI regions do not overlap any known gene (e.g. the region located upstream of SFMBT2). Such regions, especially when they abut an PLoS Computational Biology | www.ploscompbiol.org

Evaluation and Validation The accuracy of the AI predictions made by each method was evaluated using both permutation testing (in order to assess the false discovery rate) and comparison to previously characterized AI transcripts.

Permutation Testing We first estimated the false-discovery rate (FDR) of each method using a permutation test where genomic sites are randomly permuted, subject to some constraints (preservation of heterozygosity and expression level; see Methods). This randomized data set preserves 7

July 2010 | Volume 6 | Issue 7 | e1000849


Whole-Genome Differential Allelic Expression

Figure 4. Allelic imbalance in 53 HapMap individual in the GATA3 locus. Each row reports the sites where AI has been predicted by the 8state left-to-right HMM with the Viterbi algorithm. Each AI SNP is marked with a vertical black line; the impression of gray levels is an artifact of SNP density. Genes from RefSeq [44] are illustrated below. doi:10.1371/journal.pcbi.1000849.g004

the level of imbalance observed at each site, but randomly disperses sites in such a way that few regions are expected to exhibit strong and consistent allelic ratios over several consecutive sites (as real AI PLoS Computational Biology | www.ploscompbiol.org

transcripts should). For each algorithm, the number of genomic regions with AI score above some threshold t in the real data was compared to the corresponding number on the permuted data - the 8

July 2010 | Volume 6 | Issue 7 | e1000849


Whole-Genome Differential Allelic Expression

Figure 5. False discovery rates (FDR). obtained by permutation testing at thresholds resulting in different numbers of AI regions being predicted. doi:10.1371/journal.pcbi.1000849.g005

simple smoothing approach and *45% more sensitive than the second best approach, which is the ergodic HMM. Similar observations hold for other FDR thresholds. Therefore, the information obtained from the total expression levels, as well as the added site-specific transition probabilities are beneficial in terms of obtaining reliable AI predictions. This is particularly noteworthy for regions whose AI is weaker (those ranking between the 500 to 1000th per individual), for which the FDR remains quite low with the left-toright HMM but quickly increases with all other methods.

ratio of these two numbers is an estimate of the FDR of the algorithm (note that the FDR could also be estimated at the individual SNP level, rather than at the region level; the conclusions are the same). Figure 5 shows the FDR curves obtained for each method, as a function of the number of predictions made. All methods are able to detect the most obvious cases of AI (roughly 200 regions per individual, where all methods have near-zero FDR). However, as our threshold decreases and the number of regions predicted increases, the performance of the four approaches become quite different. Setting 5% as an acceptable FDR, the simple smoothing, Z-Score, ergodic HMM, and left-to-right HMMs result in 360, 622, 662, and 954 predicted regions with AI. In other words, at that FDR level, the best approach, left-to-right HMM, is *160% more sensitive than the

Comparison to Known AI Transcripts Although no comprehensive set of validated AI transcripts exists to date, a set of 62 imprinted genes (containing 1099 SNPs in our data

Figure 6. Enrichment for SNPs called as allelically imbalanced in imprinted and AI genes. (a) Overlap with regions experimentally verified to be imprinted. (b) Overlap with experimentally validated imbalanced genes from Verlaan et al. [8]. doi:10.1371/journal.pcbi.1000849.g006

PLoS Computational Biology | www.ploscompbiol.org

9

July 2010 | Volume 6 | Issue 7 | e1000849


Whole-Genome Differential Allelic Expression

set) have been collected from the literature and posted on www. geneimprint.com. Most imprinted regions are easily detected by most methods, as they affect relatively large genomic regions and their allelic expression ratios are extremely large. Figure 6 shows how the enrichment of the overlap between imprinted genes and the predictions made by each of the four methods varies as a function of the number of sites being predicted with AI. (The enrichment of the overlap between a set of predicted AI regions and a set of annotated regions is the ratio of the size of the overlap to the expected size of the overlap if AI regions had been selected randomly in the genome.) Imprinted SNPs are enriched 5 to 20-fold among the top predictions made by each algorithm (except the Z-Score approach, which assigns high scores to other types of regions). Focussing on the left-to-right HMM AI predictions at a 5% FDR threshold (which consist of roughly 40,000 SNPs per individual), we find that 67% (resp. 35%) of SNPs in imprinted regions are predicted to have AI in at least one (resp. five) individual. Manual inspection of imprinted genes that have gone undetected by any of our methods reveals genes that are short, contain few heterozygous SNPs, or are expressed at a very low levels in LCL. Allelic imbalance resulting from cis-regulatory variation typically have allele ratios less extreme than imprinted genes and are thus more difficult to detect. A set of 61 transcripts (containing 1596 SNPs in our data set) with AI resulting from cis-regulatory variation in LCL have been identified and validated by Verlaan et al. [8]. Figure 6 (b) shows the fold-enrichment of these SNPs among those predicted as AI SNPs by each of our methods. Here, the predictions made by the two types of HMMs perform

significantly better than the Z-Score and smoothing approaches, detecting approximately 50% and 100% more validated SNPs. Overall, our best approach is again the left-to-right HMM, which predicts 87% (resp. 70%) of the 1596 validated SNPS as imbalanced in at least one (resp. five) individual(s). Inspection of AI genes that were undetected showed that they exhibited little evidence of allelic imbalance by our method (see Figure S3). These represent likely false positives in earlier study as well as more localized effects caused by few independent AI measurements and driving the association tests in previous analyses [13].

Distribution of AI in the Genome and Across Individuals Our predictions allow a first glimpse into the diversity of allelic expression patterns in the human genome, although a comprehensive analysis of AI regions is beyond the scope of this study. We first observe that AI in LCL samples is widespread, with on average 9.7% (resp. 5.6%) of an individual’s genes containing at least one (resp. all) imbalanced SNP (using the left-to-right HMM with a threshold corresponding to an FDR of 5%). Considered in total, 54.4% of genes show at least one imbalanced SNP in at least one individual, and 45.6% of genes have all of their SNPs showing allelic imbalance in at least one individual. Note that only approximately 50% of genes in total are detectably expressed in LCL [39], and hence candidates for being allelically imbalanced. Thus, the majority of expressed genes show AI in one or more individuals. Figure 7 reports the distribution of AI regions across various types of genomic regions. While a substantial fraction (19%) of AI

Figure 7. Classification of AI regions based on their overlap with annotated protein-coding genes. The classification of an AI region is done based on a set of simple rules that allow for a sizable margin of error in the boundaries of the AI regions. Intergenic: Little or no overlap with annotated genes. Multiple transcripts: Overlaps several genes. Exact transcript: The left and right boundaries of the AI region match gene boundaries within 20 kb. 59 (resp. 39) end of transcript: AI region is at the 59 end (resp. 39 end) of the gene only. Intronic: AI region is within the gene but away from the gene boundaries. Extended 59 (resp 39): AI region extends upstream (resp. downstream) of the gene. doi:10.1371/journal.pcbi.1000849.g007

PLoS Computational Biology | www.ploscompbiol.org

10

July 2010 | Volume 6 | Issue 7 | e1000849


Whole-Genome Differential Allelic Expression

significant number of genes with allelic imbalance [13]. However, taking full advantage of this technology requires advanced signal processing approaches to accurately detect, delineate and quantify allelic expression. Furthermore, relying too heavily on known gene annotation may hide the fact that most AI does not perfectly align with gene boundaries. Indeed, the approaches proposed here, which do not make use of gene annotations, reveal that allelic imbalance is widespread and exhibits complex patterns in relation to annotated genes. Although our approach was specifically applied to the analysis of data obtained from high-density genotyping arrays, it should be readily applicable to studies based on data obtained next generation RNA sequencing. Detection of AI based on data from genotyping arrays proves challenging because of the significant noise in the allelic ratio measured at individual SNPs and because of the complex patterns of AI. To our knowledge, our study represents the first in-depth, statistical and computational analysis of a large scale, genomewide allelic imbalance data set. Because of the noise level in allelic expression ratios at individual SNPs, one must rely on the fact that transcripts with allelic imbalance will generally contain several SNPs that are expected to show imbalance. Our Z-Score approach identifies regions where the allele ratio is significantly different from the expected one-to-one ratio. An aspect of the data that is not exploited by the Z-Score approach is that the total expression and allelic ratio are expected to be consistent across the transcript. Our two HMM approaches model this explicitly, and obtain better results in part because of this. An additional improvement in accuracy of AI detection is obtained by our left-to-right HMM, which considers jointly the data from all individuals to serve as prior for the detection of AI in each one. This approach yields improved detection of AI regions that are shared among many individuals, while being able to detect those present in only one or a few samples. This new type of machine learning problem, where a collection of sequences of observation are expected to have been derived from a common (but unknown) model but where each individual can significantly deviate from that model is a situation that may arise in a number of other situations where our left-toright HMM approach may be useful, including for comparative genomics based gene predictions [41] (where different species are expected to share some but not all of their exon structure). Although a detailed biological analysis of allelic imbalance and its phenotypic consequences is beyond the scope of this paper, our predictions reveal that AI is widespread, with roughly 10% of genes showing evidence of AI in a given individual, and with the majority of genes expressed in LCLs showing AI in at least one of our 53 samples. Although roughly 60% of AI regions are clearly related to an annotated transcript, they often reflect the presence of alternative promoters, splicing, or transcription termination. An increasing proportion of the genetic burden of disease is being associated with differences in gene regulation [42]. At the same time greater complexity of gene regulation and the transcriptome are being uncovered [43]. Therefore, hypothesisfree methods detecting allelic imbalance are a prerequisite to advancing our understanding of population variation in cisregulatory control by heritable or epigenetic mechanisms.

Figure 8. Commonality of allelic imbalance. Number of SNPs in AI regions, as a function of the number of individuals with AI at the same site. doi:10.1371/journal.pcbi.1000849.g008

regions closely match annotated gene boundaries, most exhibit more complex relationships to annotated protein-coding gene transcripts, a larger portion of AI regions (28%) are within annotated genes but cover only a fraction of the transcript. In nearly half of those, allelic expression is found toward the 39 end of the gene, possibly because of allele-specific transcription termination or mRNA degradation, or the presence of an allele-specific alternate transcription start site within the annotated gene. The presence of AI regions at the 59 end of the transcript appears somewhat less frequent. 22% have little or no overlap with protein-coding genes, although this fraction is enriched for other types of transcripts such as LINC-RNAs [40]. Our data set affords a first glimpse into the commonality of allelic imbalance at a given site across individuals. We calculated the number of individual showing AI (based on the Viterbi predictions; see Figure 8). The very long tail of this distribution indicates that a lot of AI is shared among a portion of the population. In fact, *65% of an individual’s AI regions are found in at least 10 other individuals. Allelic imbalance, whether caused by genetic or epigenetic causes, is thus highly structured in the human population. On the other hand, rare AI, defined as that seen in at most 10% of our individuals, constitutes approximately 20% of an individual’s AI regions, while 4% are unique to that individual. We note however that because AI regions found in a large number of samples are easier to detect than those that are less common in the population, we may underestimate the proportion of AI that is found in a small number of individuals. We note that the left-toright HMM predictions used for this analysis are potentially biased towards over-predicting sites with common AI and under-predicting those with rare AI. We thus repeated the analysis with the ergodic HMM approach, which does not suffer from this bias. The results were very similar, with only a very slight shift toward less frequent AI.

Supporting Information

Discussion

Figure S1 Analysis of the noise using technical replicates. (a) Replicability of expression value E. (b) Replicability of allelic ratio R. Found at: doi:10.1371/journal.pcbi.1000849.s001 (0.14 MB TIF)

The recent development of a genome-wide high-density assay of allelic imbalance based on genotyping arrays has resulted in a vast improvement in our understanding of this type of variation and in our ability to map this variation to causative regulatory SNPs [13]. A relatively simple gene-based analysis was sufficient to identify a PLoS Computational Biology | www.ploscompbiol.org

Performance of ergodic HMM with different levels of discretization. False-discovery rate obtained by ergodic HMMs

Figure S2

11

July 2010 | Volume 6 | Issue 7 | e1000849


Whole-Genome Differential Allelic Expression

with 4, 6, 8, and 10 states (corresponding to 1, 2, 3 and 4 levels of positive and negative allelic imbalance). Found at: doi:10.1371/journal.pcbi.1000849.s002 (0.15 MB TIF)

Acknowledgments We thank Javad Sadri for useful discussions, as well as three anonymous reviewers for their suggestions.

Analysis of AI data in false-negative regions. Red: Genome-wide distribution of AI measurements (total expression vs allelic ratio). Green: AI measurements in genes identified as imbalanced by Verlaan et al. [8] but not predicted as such by our approach. These genes show no sign of imbalance in our data. Found at: doi:10.1371/journal.pcbi.1000849.s003 (0.62 MB TIF)

Figure S3

Author Contributions Conceived and designed the experiments: JRW TP MB. Performed the experiments: JRW. Analyzed the data: JRW. Contributed reagents/ materials/analysis tools: BG DP KLG TP. Wrote the paper: JRW MB.

References 1. Pastinen T, Hudson T (2004) Cis-acting regulatory variation in the human genome’’. Science 306: 647–650. 2. Carrel L, Willard H (2005) X-inactivation profile reveals extensive variability in x-linked gene expression in females. Nature 434: 400–404. 3. Rockman MV, Kruglyak L (2006) Genetics of global gene expression. Nature Reviews Genetics 7: 862–872. 4. Pastinen T, Sladek R, Gurd S, Sammak A, Ge B, et al. (2004) A survey of genetic and epigenetic variation affecting human gene expression. Physiol Genomics 16: 184–193. 5. Pastinen T, Ge B, Gurd S, Gaudin T, Dore C, et al. (2005) Mapping common regulatory variants to human haplotypes. Hum Mol Genet 14: 3963– 3971. 6. Serre D, Gurd S, Ge B, Sladek R, Sinnett D, et al. (2008) Global differential allelic expression in the human genome: A robust approach to identify genetic and epigenetic cis-acting mechanisms regulating gene expression. PLoS Genetics 4: e1000006. 7. Campino S, Forton J, Raj S, Mohr B, Auburn S, et al. (2008) Global validating discovered cis-acting regulatory genetic variants: Application of an allele specific expression approach to hapmap populations. PLoS One 3: e4105. 8. Verlaan DJ, Ge B, Grundberg E, Hoberman R, Lam KC, et al. (2009) Targeted screening of cis-regulatory variation in human haplotypes. Genome Research 19: 118–127. 9. Pollard KS, Serre D, Wang X, Tao H, Grundberg E, et al. (2008) A genomewide approach to identifying novel-imprinted genes. Human Genetics 122: 625–634. 10. Gimelbrant A, Hutchinson J (2007) Widespread monoallelic expression on human autosomes. Science 318: 1136–1140. 11. Pant KPV, Tao H, Beilharz EJ, Ballinger DG, Cox DR, et al. (2006) Analysis of allelic differential expression in human white blood cells. Genome Research 16: 331–339. 12. Lo SH, Wang Z, Hu Y, Yang HH, Gere S, et al. (2003) Allelic variation in gene expression is common in the human genome. Genome Research 13: 1855–1862. 13. Ge B, Pokholok DK, Kwan T, Grundberg E, Morcos L, et al. (2009) Global patterns of cis variation in human cells revealed by high-density allelic expression analysis. Nature Genetics 41: 1216–1222. 14. International HapMap Consortium, Frazer KA, Ballinger DG, Cox DR, Hinds DA, et al. (2007) A second generation human haplotype map of over 3.1 million snps. Nature 449: 851–861. 15. Rueda O, Diaz-Uriarte R (2007) Flexible and accurate detection of genomic copy-number changes from aCGH. PLoS Comput Biol 3: e122. 16. Marioni J, Thorne N, Valsesia A, Fitzgerald T, Redon R, et al. (2007) Breaking the waves: improved detection of copy number variation from microarray-based comparative genomic hybridization. Genome Biol 8: R228. 17. Shah SP (2008) Computational methods for identification of recurrent copy number alteration patterns by array cgh. Cytogenetic and genome research 123: 343–351. 18. Shah SP, Xuan X, Deleeuw RJ, Khojasteh M, Lam WL, et al. (2006) Integrating copy number polymorphisms into array cgh analysis using a robust hmm. Bioinformatics 22. 19. Li C, Beroukhim R, Weir B, Winckler W, Garraway L, et al. (2008) Major copy proportion analysis of tumor samples using snp arrays. BMC Bioinformatics 9: 204. 20. Wu L, Zhou X, Li F, Yang X, Chang C, et al. (2009) Conditional random pattern algorithm for loh inference and segmentation. Bioinformatics 25(1): 61–7. 21. Yau C, Holmes C (2008) CNV discovery using SNP genotyping arrays. Cytogenet Genome Res 123(1–4): 307–12. 22. Baross A, Delaney A, Li H, Nayar T, Flibotte S, et al. (2007) Assessment of algorithms for high throughput detection of genomic copy number variation in oligonucleotide microarray data. BMC Bioinformatics 8: 368.

PLoS Computational Biology | www.ploscompbiol.org

23. Nannya Y, Sanada M, Nakazaki K, Hosoya N, Wang L, et al. (2005) A robust algorithm for copy number detection using high-density oligonucleotide single nucleotide polymorphism genotyping arrays. Cancer Res 65(14): 6071–9. 24. Bengtsson H, Irizarry R, Carvalho B, Speed T (2008) Estimation and assessment of raw copy numbers at the single locus level. Bioinformatics 24(6): 759–767. 25. Bengtsson H, Wirapati P, Speed T (2009) A single-array preprocessing method for estimating full-resolution raw copy numbers from all affymetrix genotyping arrays including genomewideSNP 5 and 6. Bioinformatics 25(17): 2149–56. 26. Wang K, Li M, Hadley D, Liu R, Glessner J, et al. (2007) Penncnv: An integrated hidden markov model designed for high-resolution copy number variation detection in whole-genome snp genotyping data. Genome Research 17: 1665–1674. 27. Colella S, Yau C, Taylor J, Mirza G, Butler H, et al. (2007) QuantiSNP: an objective bayes hidden-markov model to detect and accurately map copy number variation using snp genotyping data. Nucleic Acids Res 35(6): 2013–25. 28. Venkatraman E, Olshen A (2007) A faster circular binary segmentation algorithm for the analysis of array cgh data. Bioinformatics 23(6): 657–663. 29. Fearnhead P (2006) Exact and efficient bayesian inference for multiple changepoint problems. Statistics and Computing 16: 203–213. 30. Browning S (2008) Missing data imputation and haplotype phase inference for genome-wide association studies. Human Genetics 124: 439–450. 31. Baum LE, Petrie T, Soules G, Weiss N (1970) A maximization technique occurring in the statistical analysis of probabilistic functions of markov chains. The Annals of Mathematical Statistics 41: 164–171. 32. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the em algorithm. JOURNAL OF THE ROYAL STATISTICAL SOCIETY, SERIES B 39: 1–38. 33. Mitchell T (1997) Machine Learning. McGraw Hill. 34. Viterbi A (1967) Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory 13: 260269. 35. Rabiner L (1989) A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE 77: 257286. 36. Eddy SR (1998) Profile hidden markov models (review). Bioinformatics 14: 755–763. 37. Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, et al. (2002) The human genome browser at ucsc. Genome Res 12: 996–1006. 38. Kent W, Baertsch R, Hinrichs A, Miller W, Haussler D (2003) Evolution’s cauldron: Duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci USA 100(20): 11484–11489. 39. Cheung VG, Conlin LK, Weber TM, Arcaro M, Jen KY, et al. (2003) Natural variation in human gene expression assessed in lymphoblastoid cells. Nat Genet 33: 422–425. 40. Khalil AM, Guttman M, Huarte M, Garber M, Raj A, et al. (2009) Many human large intergenic noncoding rnas associate with chromatin-modifying complexes and affect gene expression. Proceedings of the National Academy of Sciences of the United States of America 106: 11667–11672. 41. Siepel A, Diekhans M, Brejova B, Langton L, Stevens M, et al. (2007) Targeted discovery of novel human exons by comparative genomics. Genome Research 17(12): 1763–73. 42. Cookson W, Liang L, Abecasis G, Moffatt M, Lathrop M (2009) Mapping complex disease traits with global gene expression. Nature reviews Genetics 10: 184–194. 43. ENCODE Project Consortium, Birney E, Stamatoyannopoulos JA, Dutta A, Guigo´ R, et al. (2007) Identification and analysis of functional elements in 1% of the human genome by the encode pilot project. Nature 447: 799–816. 44. Pruitt KD, Tatusova T, Maglott DR (2007) Ncbi reference sequences (refseq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 35.

12

July 2010 | Volume 6 | Issue 7 | e1000849


Altering a Histone H3K4 Methylation Pathway in Glomerular Podocytes Promotes a Chronic Disease Phenotype Gaelle M. Lefevre1¤a, Sanjeevkumar R. Patel2, Doyeob Kim1¤b, Lino Tessarollo3, Gregory R. Dressler1* 1 Department of Pathology, University of Michigan, Ann Arbor, Michigan, United States of America, 2 Department of Internal Medicine, University of Michigan, Ann Arbor, Michigan, United States of America, 3 Neural Development Section, National Cancer Institute, Frederick, Maryland, United States of America

Abstract Methylation of specific lysine residues in core histone proteins is essential for embryonic development and can impart active and inactive epigenetic marks on chromatin domains. The ubiquitous nuclear protein PTIP is encoded by the Paxip1 gene and is an essential component of a histone H3 lysine 4 (H3K4) methyltransferase complex conserved in metazoans. In order to determine if PTIP and its associated complexes are necessary for maintaining stable gene expression patterns in a terminally differentiated, non-dividing cell, we conditionally deleted PTIP in glomerular podocytes in mice. Renal development and function were not impaired in young mice. However, older animals progressively exhibited proteinuria and podocyte ultra structural defects similar to chronic glomerular disease. Loss of PTIP resulted in subtle changes in gene expression patterns prior to the onset of a renal disease phenotype. Chromatin immunoprecipitation showed a loss of PTIP binding and lower H3K4 methylation at the Ntrk3 (neurotrophic tyrosine kinase receptor, type 3) locus, whose expression was significantly reduced and whose function may be essential for podocyte foot process patterning. These data demonstrate that alterations or mutations in an epigenetic regulatory pathway can alter the phenotypes of differentiated cells and lead to a chronic disease state. Citation: Lefevre GM, Patel SR, Kim D, Tessarollo L, Dressler GR (2010) Altering a Histone H3K4 Methylation Pathway in Glomerular Podocytes Promotes a Chronic Disease Phenotype. PLoS Genet 6(10): e1001142. doi:10.1371/journal.pgen.1001142 Editor: Veronica van Heyningen, Medical Research Council Human Genetics Unit, United Kingdom Received March 30, 2010; Accepted September 28, 2010; Published October 28, 2010 This is an open-access article distributed under the terms of the Creative Commons Public Domain declaration which stipulates that, once placed in the public domain, this work may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. Funding: This work was supported by NIH grant DK073722 and DK054740 to GRD. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. * E-mail: dressler@umich.edu ¤a Current address: Laboratory of Molecular Biology, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland, United States of America ¤b Current address: CDI Biosciences, Madison, Wisconsin, United States of America

developmental contexts. Genes of the Polycomb and Trithorax families encode proteins that are required for methylation of different histone lysine residues and often correlate with gene silencing or activation, respectively [6–9]. Many Trithorax group proteins, such as Drosophila TRX and human KMT2A (MLL), are histone H3 lysine 4 (H3K4) methyltransferases (KMTs) and are essential for maintaining gene expression patterns in diverse organisms. Recently, we discovered a novel co-factor, PTIP (Pax Transactivation-domain Interacting Protein), which is encoded by the Paxip1 gene. The PTIP protein co-purifies with the mammalian lysine methyltransferases KMT2B and KMT2C (formerly ALR and MLL3), is broadly expressed, and is essential for embryonic development [10–12]. At least in one case, PTIP is able to recruit the KMT2B complex to a developmental DNA binding protein in a locus specific manner [13]. Loss of PTIP function in the mouse results in gross developmental effects at gastrulation, with reduced levels of global H3K4 di- (me2) and trimethylation (me3) observed [13,14]. In cultured mouse embryonic stem cells, PTIP is needed to maintain pluripotency, Oct4 expression, and normal levels of H3K4 trimethylation [15]. Similarly, in neuronal stem cells, differentiation is abrogated and levels of H3K4 methylation are reduced in tissue specific PTIP knockouts [13]. In mouse embryo fibroblasts, loss of PTIP blocks

Introduction The process of embryonic development determines the differentiated state of all cells by establishing unique gene expression patterns, or signatures, for individual cell types that define their phenotypes. Once a differentiated state is established, it is difficult to erase that epigenetic imprint and reprogram the cell towards a different cell lineage or phenotype. Although reprogramming can be forced by nuclear transplantation [1] or by the expression of Oct4 and accessory factors [2,3], the low efficiency of these processes speaks to the inherent stability of a differentiated cell. Gene expression patterns must be established and maintained by compartmentalizing the genome into active and inactive regions, which is thought to occur through the covalent modifications of DNA and its associated nucleosomes. Such modifications include DNA methylation of CpG islands and methylation, acetylation, and ubiquitination of histone tails, all of which are thought to determine chromatin structure and accessibility [4,5]. This epigenetic code is thus imprinted upon the primary genetic code during embryonic development to help establish cell lineages and restrict fate. The genetics and biochemistry of histone modifications have been well studied in a variety of model organisms and PLoS Genetics | www.plosgenetics.org

1

October 2010 | Volume 6 | Issue 10 | e1001142


H3K4 Methylation and Chronic Glomerular Disease

function had not been previously studied in podocytes. Our results demonstrate a maintenance function for PTIP-mediated H3K4 methylation and identify a novel role for Ntrk3 in podocyte foot process patterning.

Author Summary While all cells contain essentially the same genome, adult differentiated cells have specific patterns of gene expression for unique physiological functions. Gene expression depends on specific proteins that activate some genes and repress others so that a stable pattern of expression is maintained. During embryonic development, epigenetic modifications of the genome may compartmentalize the genome into actively expressed or repressed domains through the methylation of specific histone residues on chromatin. We studied a specific pathway of histone H3 lysine 4 methylation by deleting the co-factor PTIP in a differentiated cell type. We then asked whether this epigenetic pathway is still important for maintaining the correct pattern of gene expression. Using the podocyte cells of the glomerulus as a model system, mice that carry deletions of the PTIP protein only in these podocytes show changes in gene expression patterns over time and exhibit a slowly progressing chronic disease phenotype. Chromatin immunoprecipitation showed a loss of PTIP binding and lower H3K4 methylation at the Ntrk3 locus, whose expression was significantly reduced. These data demonstrate the need for maintaining the correct epigenetic pattern in an aging, differentiated cell type and point to modifications in epigenetics as potential disease causing factors.

Results Generation of a Podocyte-Specific Paxip1 Deletion To specifically knockout PTIP protein in fully differentiated mouse podocytes, we utilized both floxed (fl) and conventional null (-) alleles of Paxip1 and a Cre driver strain specific for glomerular podocytes. The Paxip1fl/2:CreNPHS2 mice were crossed to Paxip1fl/fl animals to generate Paxip1fl/fl or Paxip1fl/2 with or without CreNPHS2. The CreNPHS2 mice utilize the NPHS2 promoter to express Cre recombinase only in late developing and mature podocytes [24,25]. The resulting progenies were born in the expected Mendelian ratios and did not show any gross kidney defects during the first 4 weeks of life (data not shown). For simplicity, we will refer to the mice as either PTIP2 (Paxip1fl/2:CreNPHS2; Paxip1fl/fl:CreNPHS2) or PTIP+ (Paxip12/fl, or Paxip1fl/fl). PCR analysis indicated that recombination occurred at the Paxip1 locus in DNAs isolated from kidneys but not in DNAs from tails (Figure 1A). Previous work established that the Paxip1fl allele produces normal levels of protein, but Cre-mediated excision of exon 1 and the promoter region results in complete absence of PTIP protein, essentially creating a null allele [13,15]. The specificity of the Cre driver strain was confirmed by crossing CreNPHS2 mice to the Rosa26-LacZ reporter mice (Figure 1B). In 1 month old kidneys, lacZ expression was restricted to the glomerulus only, indicating efficient Cre mediated excision at this time. Immunostaining for PTIP and the podocyte marker WT1 also confirmed that PTIP protein levels were reduced only in the podocyte cells and not the mesangial or endothelial components of the glomerular tuft (Figure 1C). Previous work showed that a loss of PTIP function results in reduced levels of total H3K4me3 levels in embryos and cultured cells [13–17]. To test whether podocytes showed reduced H3K4me3, we stained kidney sections with antibodies specific for this modification (Figure 1D). Many podocytes were observed with reduced signal intensities. To quantitate this effect, images were analyzed for signal intensity by integrating a fixed area over the nuclei of both podocytes and other cell types (Figure 1E). Podocytes were co-stained with WT1 antibodies. The ratio of podocyte signal (WT1+) to other cell types (WT12) was calculated by counting at least 6 cells of each type per glomerulus. The ratios from at least 8 glomeruli were averaged for each genotype and shown to decrease by more than 20% in PTIP2 kidneys compared to PTIP+ controls (p,0.01). These data confirmed that the specific deletion of PTIP in the podocytes correlates with a reduction in H3K4me3 in this cell type.

differentiation by inhibiting PPARc and C/EPBa activation and H3K4 methylation at their respective promoters [16]. Similarly, the Drosophila homologue of PTIP is also essential for development, epigenetic control of gene expression, and global histone H3K4 methylation [17]. During cell division, patterns of histone methylation must be inherited by daughter cells such that the cellular phenotype is maintained. For repressive histone methylation marks, such as histone H3 lysine 27, the EED (Embryonic Ectodermal Development) protein is thought to bind and recruit the Polycomb Repressor Complex 2 to replicate and maintain gene silencing after mitotic cell division [18,19]. For highly expressed genes, the KMT2A (MLL1) protein associates with promoter regions on condensed mitotic chromatin and is required to rapidly reactivate such genes after cell division [20]. These data suggest a model whereby histone methylation patterns are replicated during mitosis, but do not address the necessity for maintaining epigenetic modifications in terminally differentiated, non-dividing cells. Furthermore, changes in the expression of epigenetic regulatory genes have been reported in a variety of cancers [21] and disease states [22], but whether these are the cause or the result of disease remains to be determined. To address the necessity of H3K4me3 in a stable non-dividing cell type, we utilized a Podocin-Cre transgenic driver to delete PTIP in the glomerular podocyte, a highly specialized and architecturally distinct cell that establishes the kidney filtration barrier. Podocytes are clinically relevant cells whose properties and expression profiles change in glomerular diseases and in older animals [23]. While the ubiquitous expression of PTIP, its role in H3K4 methylation, and its necessity in development and differentiation are all well established, whether PTIP deletion in terminally differentiated cells can induce changes in the pattern of H3K4me3 and gene expression has not been demonstrated. We show that loss of PTIP results in changes in the transcriptional profile of terminally differentiated podocyte cells, which ultimately leads to a chronic glomerular disease phenotype. Among the most affected is the neurotrophin receptor encoding gene Ntrk3, whose PLoS Genetics | www.plosgenetics.org

Development of a Chronic Glomerular Disease Phenotype Podocytes play a critical role in the establishment and maintenance of the glomerular filtration barrier. Interdigitated podocyte foot processes cover the glomerular basement membrane and form specialized junctions, called slit diaphragms, which create a highly selective barrier that filters small and negatively charged proteins and solutes from the blood to the urinary space. Damage to or loss of podocytes impairs the filtration barrier and results in increased rates of excretion of high molecular weight proteins, such as albumin, into the urine. Thus, we checked mice for proteinuria beginning at 1 month of age (Figure 2A). At 1 month, low levels of albumin were detected in the urine but these were not significantly different between PTIP+ and 2

October 2010 | Volume 6 | Issue 10 | e1001142


H3K4 Methylation and Chronic Glomerular Disease

Figure 1. Generation of a Podocyte-Specific Paxip1 Deletion. A) PCR genotyping with primer pairs specific for the excised, null allele indicates Paxip1 excision only in the kidney DNA and only in mice carrying the CreNPHS2 transgene. B) Enzymatic staining for b-galactosidase activity (blue) in kidney sections from 1 month old mice with the indicated genotypes. C) Immunostaining for WT1 (green) and PTIP (red) in glomeruli at 3 months of age show reduced PTIP signals in the WT1 positive cells (arrows) of PTIP2 kidneys compared to PTIP+ control littermates. The overlays were counterstained with DAPI to mark all nuclei. Thus double positives (WT1 and PTIP) are light purple whereas single positives (WT1 only) are green. D) Immunostaining for H3K4me3 and WT1 in kidneys of 3 months old PTIP+ and PTIP2 mice. Note reduced intensity of podocyte cells (arrows) in PTIP2

PLoS Genetics | www.plosgenetics.org

3

October 2010 | Volume 6 | Issue 10 | e1001142


H3K4 Methylation and Chronic Glomerular Disease

mice, when compared to other cells on the sections. E) Image analysis of immunostaining for H3K4me3 from 3 month old PTIP+ and PTIP2 mice. The total signal strength was calculated by integrating over a fixed area and the data are expressed as the ratio of podocytes to mesangial and endothelial cell signals. Mean ratios from 6 podocytes and 6 other cell types were calculated from 8 independent samples for each genotype. Error bars are one standard deviation from the mean. The p value was calculated by the students t-test for 2 independent variables. doi:10.1371/journal.pgen.1001142.g001

patterning (Figure 4B, 4C). Transmission electron micrographs at 3 months also revealed that the slit-diaphragms were not evenly spaced and fusion of foot processes was frequent (Figure 4D–4F). By 12 months, the remaining podocytes in the PTIP2 kidneys were broader, flatter and displayed significant fusion or effacement (Figure 4G, 4H), consistent with the high levels of albumin detected in the urine. These data demonstrate that the initial glomerular phenotype in PTIP2 kidneys is due primarily to differences in podocyte foot process morphology, which occurs prior to the loss of cell bodies.

PTIP2 animals. However, by 3 months of age the PTIP2 mice showed significantly higher levels of albumin in the urine and these levels increased further at 6 and 12 months. The urine albumin to creatinine ratio (ACR) provides a quantitative assay that correlates with filtration barrier integrity. No significant differences were observed at 1 month (Figure 2B). However, by 3 and 12 months, ACR were 10 and 30 fold higher respectively in urines of PTIP2 animals compared to PTIP+ mice. Mice that carried the CreNPHS2 transgene in a Paxip1+/+ or a Paxip1fl/+ genetic background did not show any renal abnormalities at 12 months (data not shown), consistent with many published reports that have used this particular Cre driver strain [25–28]. Renal pathology was characterized by light microscopy at 1, 3, and 12 months of age. Standard Masson’s Trichrome and Periodic-Acid-Shiff stainings revealed significant sclerosis and matrix deposition in 12 month old glomeruli from PTIP2 animals (Figure 2C). However, 3 month old kidneys did not show significant differences for most glomerular sections, at the light microscopy level, although evidence of limited matrix expansion could be observed in a small number of glomeruli of PTIP2 kidneys. In 12 month old kidneys, significant interstitial fibrosis and protein filled cysts were also observed (Figure 2D). These are likely to be secondary effects due to the glomerular pathology. Glomerular pathology and increased albuminuria can be the direct result of podocyte death [29]. Thus, we used a variety of markers to characterize the glomerular architecture and the numbers of podocyte cells at various ages to insure that the phenotype of the PTIP2 mice was not just the result of early podocyte cell death. Immunostaining with WT1, Nephrin, and Podocin antibodies enabled us to determine the podocyte numbers, as average per mid-cross section, and to indirectly assess the integrity of the slit diaphragm (Figure 3). The number of WT1 positive podocytes was not significantly different between PTIP+ and PTIP2 glomeruli at 1 or 3 months of age. At 6 months, PTIP2 glomeruli had slightly fewer podocytes and by 12 months, the number of podocytes was half that of the PTIP+ littermates. Immunostainings for podocyte markers such as WT1, Nephrin, and Podocin did not reveal dramatic differences at 1 or 3 months, despite the increase in proteinuria, although some discontinuous staining could be seen with Podocin antibodies in PTIP2 glomeruli (Figure 3B). Consistent with this data, TUNEL staining for apoptosis did not reveal differences between PTIP+ and PTIP2 kidneys at 1 or 3 months of age (data not shown). Thus, the breakdown of the filtration barrier was not due to simple podocyte depletion at these early times. However by 12 months of age, the extensive network of Nephrin staining was partially depleted in PTIP2 glomeruli (Figure 3B). At the light microscopy level, the effects of PTIP loss on glomerular architecture seemed minimal at 3 months of age, yet the levels of albumin in the urine suggested significant functional defects. Thus, we utilized scanning and transmission electron microscopy to characterize the podocytes at the ultra structural level (Figure 4). Scanning electron micrographs revealed disorganized foot processes at 3 months. While PTIP+ podocytes had regularly arrayed tertiary foot-processes that were almost parallel (Figure 4A), the PTIP2 podocyte foot processes were much more irregular and flattened. The parallel pattern of interdigitation was clearly different and resembled a jigsaw puzzle with random PLoS Genetics | www.plosgenetics.org

Alteration of the Gene Expression Program Precedes the Disease Phenotype Alterations in cellular phenotypes could be the result of changes in the transcriptional program of PTIP2 podocytes. Thus, we prepared RNA from glomeruli enriched fractions at 1 month of age, prior to the onset of any significant phenotype, and assayed for gene expression changes by Affymetrix microarrays. We compared glomerular RNA preps from 10 independent PTIP2 animals and 8 PTIP+ littermates at 1 month of age. The data were highly consistent and indicated both gain and loss of gene expression in the PTIP2 kidneys (Table 1 and Table 2). The entire dataset can be accessed at the Gene expression Omnibus (GSE17709). Expression changes were confirmed by quantitative RT-PCR for selected genes (Figure 5). Among the genes increased was Protamine1 (Prm1), which is not normally expressed in podocytes or other somatic cells but is found only in spermatids where it is essential for chromatin condensation and fertility [30,31]. The changes in RNA expression observed were surprising and did not correspond to any common pathways. In fact, the podocyte-specific genes that are known to function in cell viability and slit diaphragm integrity were largely unchanged (Table S1 and Figure 5C). The data suggest that loss of PTIP in podocytes alters the transcriptional program to affect a limited number of genes whose functions in the podocytes have not been previously characterized.

PTIP Deletion Affects Ntrk3 Expression and Histone Methylation Among the most interesting genes whose expression was down regulated in PTIP2 kidneys was the neurotrophic tyrosine kinase receptor type 3 (Ntrk3, formerly called TrkC), whose expression in podocytes had not been previously described. The Ntrk3 gene encodes two proteins that recognize neurotrophin 3 (NT-3) and functions in axon guidance and innervation and in cardiac development [32–34]. Ntrk3 promotes axon outgrowth and guidance, presumably through actin based extension and retraction of cellular processes [35]. Given that podocyte foot processes are also actin based and may require some type of guidance, we examined the role of Ntrk3 further. Quantitative RT-PCR confirmed that Ntrk3 expression was down approximately 10 fold in glomerular preps from PTIP2 compared to PTIP+ animals (Figure 5A). We also examined Ntrk3 levels in kidneys by coimmunostaining kidney sections with Ntrk3, WT1 and Nephrin antibodies (Figure 6). At 3 months of age, Ntrk3 could be seen in glomeruli of PTIP+ kidneys, however the staining intensity in PTIP2 kidneys was severely reduced in almost every glomerulus examined (Figure 6D, 6J). Some slight filamentous staining 4

October 2010 | Volume 6 | Issue 10 | e1001142


H3K4 Methylation and Chronic Glomerular Disease

PLoS Genetics | www.plosgenetics.org

5

October 2010 | Volume 6 | Issue 10 | e1001142


H3K4 Methylation and Chronic Glomerular Disease

Figure 2. Chronic Glomerular Disease in PTIP2 Kidneys. A) Coomassie blue staining of SDS/PAGE gels of urine samples from PTIP+ and PTIP2 mice at 1 month and 3 months. Mouse albumin (al) is shown as a control. B) Urine albumin to creatinine ratios (ACR) as measured at 1, 3, and 12 months of age in PTIP+ and PTIP2 animals. C) Histological sections from kidneys at 3 and 12 months. Representative glomerular sections were stained with Masson’s Trichrome (3 and 12 months) or Periodic Acid-Shiff (12 months). Significant matrix deposition was observed in 12 months old PTIP2 glomeruli. D) Low power view of a kidney section at 12 months of age shows tubulointerstitial fibrosis, protein filled cysts, and glomerular sclerosis in PTIP2 animals. doi:10.1371/journal.pgen.1001142.g002

mature glomeruli, those located closest to the medullary zone. Podocyte foot processes from Ntrk32/2 mice exhibited disorganized secondary and tertiary processes that crisscrossed randomly over capillary vessels and were poorly interdigitated (Figure 9A9, 9B9). Few sections showed the characteristic spacing indicative of the slit diaphragms at the glomerular basement membranes (Figure 9D9). These data suggest a critical role for Ntrk3 in the fine patterning events of secondary and tertiary foot process formation and interdigitation.

remained in the PTIP2 glomeruli, but the overall intensity was markedly different. In PTIP+ glomeruli, Ntrk3 staining was remarkably similar to Nephrin (Figure 6G–6I). However, Nephrin staining intensity was unaffected in PTIP2 glomeruli even though Ntrk3 was much lower (Figure 6J–6L). The Ntrk3 expression in glomerular preps and its decrease in the PTIP2 kidneys suggested a function in foot process growth, guidance, and/or pattern formation. In order to more directly link PTIP to the Ntrk3 locus, we designed chromatin immunoprecipitation experiments to examine the presence of PTIP and the changes in histone methylation patterns around the transcription initiation site (+1) of Ntrk3 (Figure 7). Chromatin was prepared from whole glomerular preps from PTIP+ and PTIP2 kidneys, which also included mesangial and endothelial cells. Despite the presence of other cell types in the glomerular chromatin, we were able to detect a 5–6 fold decrease in PTIP localization to sequences around the start site of Ntrk3 transcription when comparing PTIP+ to PTIP2 chromatin (Figure 7B). No significant amount of PTIP was detected further upstream (21200), nor did we see a significant difference, between PTIP+ and PTIP2 chromatin, in PTIP localization within the 59 UTR of exon 1 (Figure 7B, P4 site). Clear differences in H3K4me2 were also measured, with an approximately 50–60% decrease in PTIP2 chromatin with primer pairs P2–P4, but not with P1 at 21200 (Figure 7C). Similarly, H3K4me3 levels were also decreased in PTIP2 chromatin at P2–P4 but not at P1 (Figure 7D). We also examined changes in Polycomb mediated epigenetic silencing marks using an antibody against H3K27me3 (Figure 7E), which appeared unchanged at all sites examined. These data demonstrate recruitment of PTIP to the promoter region of Ntrk3 in normal glomeruli.

Discussion In this report, we utilized a conditional deletion to ask whether the PTIP dependent H3K4 methylation function is required in a terminally differentiated cell type, to maintain its differentiated state and its cell-type specific transcriptional program. Using the glomerular podocyte cell as a model, we show that deletion of PTIP results in subtle changes in gene expression patterns that ultimately lead to a slowly progressing disease state. These data support a model in which the gross stability of the differentiated state or podocyte cell survival, at least in the short term, does not depend on the PTIP/KMT complex, as many of the podocyte specific genes examined were unchanged in the absence of PTIP. Rather, the loss of PTIP was more subtle and revealed unexpected changes in a small number of genes and ultimately led to a chronic disease phenotype resembling glomerular sclerosis. Typical characteristics of chronic glomerular disease were present, including microalbuminuria, podocyte foot process fusion or effacement, remodeling of the filtration barrier, and increased extracellular matrix deposition. Methylation of histone H3 at lysine 4 correlates with gene expression and is thought to regulate cellular identity by establishing and maintaining a stable epigenetic state. The PTIP protein is part of an H3K4 methyltransferase complex that includes the mammalian Trithorax homologues KMT2B and/or KMT2C [10,11,13,16]. Previous studies in flies and mice demonstrated reduced H3K4 methylation in Paxip1 mutants and severe early lethal phenotypes. In the mouse, complete loss of PTIP protein results in developmental arrest just after gastrulation [14], a phenotype more severe than any individual mouse KMT2 family gene mutation [12,36,37], whereas a hypomorphic Paxip1 allele is lethal later in development [38]. In flies, maternal and zygotic ptip null embryos are embryonic lethal and fail to express many segmentation genes [17]. In mouse embryonic stem cells, PTIP protein is required for normal levels of H3K4 methylation and for maintaining pluripotency in cell culture [15], whereas in embryonic fibroblasts PTIP is required for adipocyte differentiation [16]. All of these findings suggest that a PTIP H3K4 methyltransferase complex is needed for differentiation of stem cells and progenitor cells in development. However in terminally differentiated cells, the requirement for active H3K4 methylation may be different and the lack of cell division may abrogate the need for de novo methylation. Our results suggest that PTIP must still function in some non-dividing cells, perhaps as part of a maintenance complex, as overall levels of H3K4 methylation were reduced and activation and suppression of a small number of genes was affected.

Ntrk3 Mutants Have Podocyte Foot Process Defects In order to determine if the loss of Ntrk3 alone would impact normal glomerular patterning, we examined homozygous Ntrk3 mutant mice. The Ntrk3 mutants die shortly after birth due to cardiac and neuromuscular defects; however their kidneys had not been studied previously. Therefore, we collected urine and kidney tissue for light and electron microscopy from 3–4 day old Ntrk3 mutants and littermates. At three days post partum, Ntrk3 mutants were small and sickly. Higher levels of albumin could be observed in the urines of Ntrk32/2 pups (Figure 8A), compared to control littermates, although this could be due to delayed or arrested kidney development. Glomerular development was examined in kidney sections of 4 day old newborns (Figure 8B). At this time, nephrons are still undergoing development and glomeruli at the periphery are just beginning to form whereas cortical glomeruli closer to the medulla are already fully functional. The tight junction protein Magi2 specifically localizes to podocyte cell junctions and exhibited altered patterning in Ntrk3 mutant kidneys, with discontinuous staining and excessive looping of the developing tuft. In mature glomeruli, Nephrin staining was reduced and patchy in the Ntrk3 mutants. The number of podocytes did not seem affected in the Ntrk32/2 mice at this time. Ultra structural analysis of Ntrk3 mutant kidneys revealed podocyte patterning defects both by scanning and transmission EM (Figure 9). At 4 days post-partum, we examined the most PLoS Genetics | www.plosgenetics.org

6

October 2010 | Volume 6 | Issue 10 | e1001142


H3K4 Methylation and Chronic Glomerular Disease

Figure 3. Podocyte Viability and Glomerular Morphology. A) After immunostaining with WT1 and Nephrin antibodies, podocyte nuclei were counted in mid-cross sections through glomeruli whose vascular and proximal tubular poles were visible. Glomerular surface area for mid-cross sections was measured by morphometry and is expressed in relative units. B) Immunostaining for WT1 (pink) and Nephrin (green) at 3 months of age shows little significant difference between PTIP+ and PTIP2 glomeruli. However, Podocin staining (green, lower panels) appears less and discontinuous in PTIP2 glomeruli. Nuclei were counterstained with DAPI. By 12 months, large regions cleared of Nephrin positive staining were evident within the glomerular tufts of PTIP2 animals. doi:10.1371/journal.pgen.1001142.g003

PLoS Genetics | www.plosgenetics.org

7

October 2010 | Volume 6 | Issue 10 | e1001142


H3K4 Methylation and Chronic Glomerular Disease

Figure 4. Ultrastructural Analysis of PTIP2 Kidneys. Podocytes of PTIP2 mice showed progressive foot process disorganization and effacement, as observed by scanning (A–C, G, H) and transmission (D–F, I, J) electron microscopy. Podocyte foot processes of 3-month-old PTIP+ mice were regularly interdigitated (A, D, G), whereas those of age-matched PTIP2 podocytes (B, C, E, F, H) displayed varying degrees of disorganization (B, E) and effacement (C, F). Note that slit diaphragms could still be observed between foot processes during the early stages of disorganization (E, arrows). G–J) In addition to the foot process alterations, capillary loop deformation/enlargement (H, J) and mesangium expansion (J, asterisks) were observed in glomeruli of 12-month-old (G, H) and 3-month-old (I, J) mice analyzed by EM. Scale bars: (A–C) 1 mm; (D–F) 100 nm; (G–J) 2 mm. doi:10.1371/journal.pgen.1001142.g004

The mature podocyte is generally believed to be a non-dividing cell type, as classic cell BrdU labeling experiments do not mark this population over time [39]. However, more recent genetic lineage tracing experiments suggest that there is a population of parietal epithelial cells at the vascular pole of the Bowman’s capsule that can replenish podocytes over time [40,41]. This replacement of podocytes appears slow under normal conditions, but may be especially critical in cases of glomerular injury. In our animal model, we would expect any podocyte replacement to also delete the Paxip1 gene once expression of the Cre driver is activated. Given that we do not see significant loss of podocytes until at least 6 months of age, it may be that alterations in the transcriptional profile are not lethal. Rather, loss of podocytes may be the result of the damaged filtration barrier, the increase in the mesangium, and the general environment of the glomerulus in older mice. Alternatively, if podocyte replacement is accelerated in our model, it may be that by 6 months the ability of parietal cells to replenish the podocyte population is exhausted. In either case, the effects of manipulating the H3K4 methylation pathway is more apparent in older mice, suggesting a critical role for such epigenetic pathways in aging cells and tissues. The changes in gene expression observed in response to PTIP deletion are surprising in that most of the well-characterized podocyte-specific genes appear unaffected. However, changes PLoS Genetics | www.plosgenetics.org

include both activation and suppression of previously uncharacterized genes in the podocytes. Activation of the Prm1 gene in PTIP2 kidneys is unusual as this gene has only been associated with sperm maturation and is thought to encode a unique chromatin binding protein [31,42]. Activation of the Padi4 gene could impact gene expression by deimination of arginines in the histone H3 tail, which prevents methylation [43]. The impact of increased Padi4 is likely to be complex as arginine methylation can correlate with gene activation or repression, depending on the context and specific residues. The most compelling gene affected in PTIP2 podocytes was Ntrk3, whose expression in the glomerulus had not been previously characterized. The reduction of Ntrk3 expression in PTIP2 kidneys and the phenotype of Ntrk32/2 newborn kidneys suggest that this receptor is critical for tertiary foot process pattern formation. The podocyte is a highly specialized cell with a complex network of processes that cover the glomerular basement membrane. The large primary processes are microtubule containing structures, whereas the tertiary, interdigitated foot processes contain actin microfilaments [44]. Adjacent foot processes are connected through a specialized junctional complex, called the slit diaphragm, which is essential for maintaining a functional filtration pore. Some of the essential proteins in the slit-diaphragm, such as Nephrin, Podocin, and Neph1 are well characterized and 8

October 2010 | Volume 6 | Issue 10 | e1001142


H3K4 Methylation and Chronic Glomerular Disease

Table 1. Genes Up-Regulated in PTIP2 Podocyte.

Probe

Symbol

Description

UniGene

p-value

Fold Change*

1439379

Prm1

protamine 1

Mm.42733

0

6.38

1418398

Tspan32

tetraspanin 32

Mm.28172

0

2.92

1422760

Padi4

peptidyl arginine deiminase, type IV

Mm.250358

0.001

1.9

1433744

Lrtm2

leucine-rich repeats and transmembrane domains 2

Mm.121498

0

1.89

1433529

E430002G05Rik

RIKEN cDNA E430002G05 gene

Mm.28649

0

1.77

1436329

Egr3

early growth response 3

Mm.103737,

0

1.74

1449071

Myl7

myosin, light polypeptide 7, regulatory

Mm.46514

0.001

1.68

1419527

Comp

cartilage oligomeric matrix protein

Mm.45071

0

1.58

1419487

Mybph

myosin binding protein H

Mm.379067

0.001

1.3

1431991

2410004P03Rik

RIKEN cDNA 2410004P03 gene

Mm.159048

0

1.26

1430062

Hhipl1

hedgehog interacting protein-like 1

Mm.36423

0.004

1.19

1453228

Stx11

syntaxin 11

Mm.248648

0.003

1.16

1416077

Adm

adrenomedullin

Mm.1408

0

1.14

1457780

Stx11

syntaxin 11

Mm.248648

0.001

1.1

1434984

6330514A18Rik

RIKEN cDNA 6330514A18 gene

Mm.17613

0.004

1.08

1453152

Mamdc2

MAM domain containing 2

Mm.50841

0.012

1.05

1435830

5430435G22Rik

RIKEN cDNA 5430435G22 gene

Mm.44508

0.002

1.01

1439761

D830026I12Rik

RIKEN cDNA D830026I12 gene

Mm.136046

0.008

1

*log2 scale. doi:10.1371/journal.pgen.1001142.t001

Table 2. Genes Down-Regulated in PTIP2 Podocytes.

Probe

Symbol

Description

UniGene

p-value

Fold Change*

1425425

Wif1

Wnt inhibitory factor 1

Mm.32831

0

24.37

1441491

A330068G13Rik

RIKEN cDNA A330068G13 gene

Mm.227543

0

23.68

1433825

Ntrk3

neurotrophic tyrosine kinase, receptor, type 3

Mm.33496

0

23.09

1446622

A330068G13Rik

RIKEN cDNA A330068G13 gene

Mm.227543

0

22.14

1452779

3110006E14Rik

RIKEN cDNA 3110006E14 gene

Mm.23960

0

21.57

1452416

Il6ra

interleukin 6 receptor, alpha

Mm.2856

0

21.55

1420903

St6galnac3

Mm.440929

0

21.53

1450309

Astn2

astrotactin 2

Mm.445312

0

21.53

1433939

Aff3

AF4/FMR2 family, member 3

Mm.336679

0

21.53

1437403

Samd5

sterile alpha motif domain containing 5

Mm.101115

0.001

21.48

1429896

5830408B19Rik

RIKEN cDNA 5830408B19 gene

Mm.291322

0

21.35

1455296

Adcy5

adenylate cyclase 5

Mm.41137

0

21.3

1431946

Necab3

N-terminal EF-hand calcium binding protein 3

Mm.143748

0

21.29

1434777

Mycl1

v-myc myelocytomatosis viral oncogene homolog 1

Mm.1055

0

21.26

1419139

Gdf5

growth differentiation factor 5

Mm.4744

0.001

21.25

1441559

LOC627626

similar to CG11212-PA

Mm.390999

0.003

21.25

1441667

Smyd1

SET and MYND domain containing 1

Mm.234274

0

21.23

1423561

Nell2

NEL-like 2 (chicken)

Mm.3959

0.016

21.18

1450501

Itga2

integrin alpha 2

Mm.5007

0

21.17

1435832

Lrrc4

leucine rich repeat containing 4

Mm.443660

0

21.11

1455188

Ephb1

Eph receptor B1

Mm.22897

0.046

21.11

1455888

Lingo2

leucine rich repeat and Ig domain containing 2

Mm.132507

0.007

21.05

1426960

Fa2h

fatty acid 2-hydroxylase

Mm.41083

0

21.04

1453841

2310050P20Rik

RIKEN cDNA 2310050P20 gene

0.033

21.01

1421207

Lif

leukemia inhibitory factor

0

21

Mm.4964

*log2 scale. doi:10.1371/journal.pgen.1001142.t002

PLoS Genetics | www.plosgenetics.org

9

October 2010 | Volume 6 | Issue 10 | e1001142


H3K4 Methylation and Chronic Glomerular Disease

other genes whose functions are not well understood are also impacted. Histone methylation by Trithorax or Polycomb complexes can imprint positive and negative epigenetic marks on chromatin during development. More recently, histone methyltransferases have been associated with cancer and other disease states. However, in many cases it is not clear whether changes in the expression of epigenetic modifiers are the cause or the result of disease progression. The results presented here suggest that mutations in an epigenetic pathway, which result in alterations of H3K4 methylation patterns, can lead to a chronic disease through subtle changes in gene expression patterns. This implies a direct function for HMTs in maintaining gene expression and the differentiated state in healthy organisms.

Methods Animals Mice carrying the Paxip1 null (Paxip12) and floxed (Paxip1fl) alleles were previously described and genotyped as indicated [14,53]. To obtain the specific deletion of the Paxip1fl allele in glomerular podocytes, these mice were crossed with the previously characterized 2.5P-Cre mice [24,25], which express the Cre recombinase under the control of the human NPHS2 promoter (CreNPHS2). Among the next generations, mice carrying the Cre allele (Paxip1fl/fl:CreNPHS2 and Paxip1fl/2:CreNPHS2 mice) were considered as conditional null mutants (PTIP2), whereas littermates that did not express the Cre recombinase were used as controls (PTIP+). All animal procedures were approved by the University Committee on Use and Care of Animals (UCUCA) of the University of Michigan and performed in compliance with ULAM recommendations.

Antibodies Rabbit polyclonal antibodies used to detect Nephrin (1:1000) and Podocin (1:500) were kindly provided by L.B. Holzman (University of Pennsylvania, Philadelphia, PA). Chicken anti-PTIP was described previously [54]. Additional antibodies were commercially available: mouse clone 6F-H2 anti-WT1 (1:1000, DAKO, Carpinteria, CA), anti-H3K4me3 and anti-H3K27me3 (AbCam, Cambridge, MA), anti-Magi2 (Sigma-Aldrich, St. Louis, MO), anti-Ntrk3 (AF1404, R & D Systems, Minneapolis, MN), Alexa Fluor 488 F(ab9)2 fragment of goat anti-rabbit IgG, Alexa Fluor 568 F(ab9)2 fragment of goat anti-mouse IgG, Alexa Fluor 488 donkey anti-goat IgG (1:500; Molecular Probes, Life Technologies, Carlsbad, CA).

Figure 5. Gene Expression in the Glomerulus. Real-time qRT-PCR for the indicated genes was performed on total RNA isolated from glomerular preparations. A) Confirmation of two genes that are downregulated in PTIP2 (black) kidneys compared to controls PTIP+ (open) kidneys. B) Confirmation of two genes that are up-regulated in PTIP2 kidneys compared to controls. C) Expression levels of podocyte marker genes in PTIP+ and PTIP2 glomerular preparations. doi:10.1371/journal.pgen.1001142.g005

mutations are associated with severe nephrotic syndromes [45]. Yet, how foot process outgrowth is regulated and maintained is not clear. Our data suggests that Ntrk3, and by inference its ligand NT–3, may be important for foot process growth and patterning. NT-3 is known to promote neuronal axon guidance by stimulating actin polymerization and lamellipodia formation [46,47]. In cultured neuronal cells, NT-3 promotes localization of b-actin mRNA to the growth cones to stimulate motility and chemotaxis [48,49]. Podocytes express many proteins known to function in neurite outgrowth, such as semaphorins, neuropilins, and ephrins. A recent report even describes the release and up-take of glutamate containing synaptic-like vesicles by podocytes [50]. Furthermore, foot processes are dynamic and can retract quickly in response to polyamines like protamine sulfate [51,52]. This raises the possibility that sensing mechanisms are required for rapid actin dynamics; such mechanisms may be common to both podocytes and neurons. Still, reduction of Ntrk3 alone is unlikely to cause the phenotypic changes in PTIP2 podocytes over time, as PLoS Genetics | www.plosgenetics.org

Urine Collection and Analysis Mice had access to a standard breeder chow (Purina 5008) and water ad libitum. Urine was collected early in the afternoon for three consecutive days from individual mice at 1, 3, 6 and 12 months of age and stored frozen until use. After thawing, 2 mL urine was run on a SDS-PAGE and stained with Coomassie Blue to test for the presence of proteins/albumin, using recombinant mouse albumin (Sigma-Aldrich, St. Louis, MO) as a control. Quantitative assessment of urine albumin and creatinine concentrations were determined by ELISA using the Albuwell M and Creatinine Companion kits (Exocell Inc., Philadelphia, PA).

Specimen Preparation for Microcopy Analyses Mice at 1, 3, 6, and 12 months of age were sacrificed and their kidneys were perfused, fixed, and processed for histology, indirect fluorescence and electron microcopy analyses. Briefly, mice were anesthetized by intraperitoneal injection of 40 mg/kg sodium 10

October 2010 | Volume 6 | Issue 10 | e1001142


H3K4 Methylation and Chronic Glomerular Disease

Figure 6. Ntrk3 in the Glomerulus. Fresh frozen tissues were sectioned and fixed in methanol followed by immunostaining with goat anti-Ntrk3, rabbit anti-WT1, or rabbit anti-Nephrin, as indicated. PTIP+ sections (A–C, G–I) showed strong Ntrk3 staining in all glomeruli, in a pattern similar to Nephrin. The PTIP2 kidney sections (D–F, J–L) showed much lower levels of Ntrk3 protein in glomeruli. All micrographs were taken at manually set, equal exposures. Right panels (C, F, I, L) are overlays of Ntrk3 and WT1 or Ntrk3 and Nephrin and are counterstained with DAPI (blue) to visualize all cell nuclei. doi:10.1371/journal.pgen.1001142.g006

PLoS Genetics | www.plosgenetics.org

11

October 2010 | Volume 6 | Issue 10 | e1001142


H3K4 Methylation and Chronic Glomerular Disease

secondary fluorescent antibodies and DAPI in PBS, 0.1% Triton, 2% goat serum for 1 hour in the dark at room temperature. The sections were washed again and mounted in Mowiol. Stained and fluorescent-labeled sections were analyzed under a Nikon ES800 microscope. Micrographs were taken with a digital spot camera, using equivalent exposure times among sections. For Ntrk3 staining, fresh frozen sections were dried, fixed in methanol at 220uC and washed in PBS, 0.1% Tween 20 before incubation with anti-Ntrk3 antibodies at 1 mg/ml. For quantitation of immunofluorescent signals, ImageJ 1.42 was utilized. H3K4me3 stained sections were digitally captured and light intensity measured by placing a fixed size circular area over the nuclei of cells and summing all pixels over the given area. At least 6 podocytes and 6 control cells, either mesangial or endothelial, were measured for each of 8 glomerular tufts (at least 48 podocytes and 48 other cells for each genotype). The average signal intensity was then expressed as a ratio of podocyte intensity to non-podocyte cell intensity for each of the glomerular micrographs taken. For Cre activity detection, the Rosa26-lacZ reporter strain was used [56]. Mice carrying CreNPHS2 and Paxip1fl/fl were crossed to Rosa26-stop-lacZ:Paxip1fl/+ to generate Paxip1fl/fl:CreNPHS2: Rosa26-lacZ animals. Kidneys were excised at 1 month of age and stained for b-galactosidase activity as described [57].

Scanning and Transmission Electron Microscopy Longitudinal slices of kidneys from PTIP+ and PTIP2 mice fixed with 2.5% glutaraldehyde in 0.1M Sorensen’s buffer (pH 7.2) for 2 hours at room temperature were processed for scanning electron microscopy following standard procedures. Briefly, after several washes with the Sorensen’s buffer alone, the samples were dehydrated by successive washes in graded ethanol solutions, critical point dried, mounted on a stub, sputter coated with goldpalladium, and examined under an AMRAY 1910 field emission scanning electron microscope. Pieces of the kidney cortex (1 mm3), fixed with 2.5% glutaraldehyde in Sorenson’s buffer for 2 hours at room temperature, were processed for transmission electron microscopy following standard procedures. They were embedded in PolyBed 812 resin (Polysciences Inc.), cut into 1-micron slices and stained with toluidine blue. Sample areas were selected based on the presence of glomeruli and cut into ultra-thin sections for analysis under a Philips CM-100 transmission electron microscope. The selected SEM and TEM images are representative of at least 10 different glomeruli per kidney.

Figure 7. Chromatin Immunoprecipitation (ChIP) at the Ntrk3 Locus. A) Schematic of the sequences surrounding the first Ntrk3 exon , the transcription start site (+1) and the ATG start codon (+627) are indicated. The positions of the four primers used for PCR analyses of the immunoprecipitated chromatin are shown. B) ChIP experiment using anti-PTIP antibodies and chromatin from whole glomeruli enriched from PTIP+ (open bars) and PTIP2 (grey bars) kidneys. C) ChIP experiment as in B but with anti-H3K4me2 antibodies. D) ChIP experiment as in B but using anti-H3K4me3 antibodies. E) ChIP experiment as in B but using anti-H3K27me3 antibodies. For B–E, all values are expressed as the mean of 3 replicates; error bars are one standard deviation. Statistically significant differences are indicated (*P,0.05). doi:10.1371/journal.pgen.1001142.g007

pentobarbital and prepared for systemic perfusion. A saline solution was first injected through the abdominal aorta to the entire mouse body at a pressure of approximately 70 mmHg as previously described [55]. As soon as the general bloodstream had been cleared, a solution of 4% paraformaldehyde in PBS was substituted. It was left to perfuse at the same flow conditions for approximately 10 minutes. Kidneys were removed, decapsulated, cut into pieces, and incubated for 2 additional hours in the appropriate fixative solution before being processed for histology, indirect immunofluorescence, and electron microscopy.

Isolation of Mouse Glomeruli Glomeruli were isolated from the kidneys of individual mice by sieving as described [58]. Briefly, 1 month-old mice were sacrificed by CO2 inhalation and kidneys were removed. After decapsulation, the kidneys were finely minced on ice and passed sequentially through nylon meshes of 90 and 41 microns (Sefar Filtration Inc., Depew, NY). The glomeruli-enriched fraction (GEF) was retained on top of the 41-micron mesh, while kidney tubules were flushed through. RNA was isolated directly from the mesh.

Histology and Indirect Immunofluorescence Kidneys were fixed in 4% paraformaldehyde, embedded in paraffin, sectioned at 5 microns, and stained with Periodic Acid Shiff or Masson Trichrome. For immunofluorescence analyses with Nephrin, PTIP, WT1 and Magi2, sections were dewaxed, rehydrated, and microwaved for 10 minutes in a citric acid-based antigen unmasking solution (Vector Laboratories, Burlingame, CA). Sections were permeabilized with 0.3% Triton X-100 in PBS and blocked with 10% goat serum in PBS. Primary antibodies were incubated overnight at 4uC in PBS, 0.1% Triton, 2% goat serum. Sections were washed twice and incubated with the PLoS Genetics | www.plosgenetics.org

RNA Extraction and Reverse Transcription Total RNA was extracted from the GEF of individual 1-monthold mice using the RNeasy Tissue Micro Kit (Qiagen, Valencia, CA) following the manufacturer’s instructions. RNA concentration and purity were determined by nanodrop analysis on an Agilent Bioanalyzer 2100 (Agilent Technologies, Santa Clara, CA). Using the Ovation RNA Amplification System V2 (NuGEN Technologies, San Carlos, CA), 500 ng total RNA was reversed transcribed and linearly amplified into single-stranded cDNA, 12

October 2010 | Volume 6 | Issue 10 | e1001142


H3K4 Methylation and Chronic Glomerular Disease

Figure 8. Analysis of Ntrk3 Mutant Kidneys. A) Comassie stained SDS/PAGE gels of urine collected from 4 day old Ntrk32/2 and wild-type littermates. B) Immunostaining of 4 day old kidneys from wild-type and Ntrk32/2 kidneys as indicated. From left to right, glomeruli are shown at increasingly older stages of development. Note discontinuous Magi2 staining and reduced Nephrin staining in older glomeruli of Ntrk32/2 kidneys compared to control littermates. doi:10.1371/journal.pgen.1001142.g008

which concentration and purity were determined by nanodrop analysis on an Agilent Bioanalyzer 2100 (Agilent Technologies).

Microarray and Real-Time qPCR Analyses Microarray analyses were done by the University of Michigan Comprehensive Cancer Center (UMCCC) Affymetrix and Microarray Core Facility. The FL-Ovation cDNA Biotin Module V2 kit (NuGEN Technologies, San Carlos, CA) was used to produce biotin-labeled cRNA, which was then fragmented and hybridized to a Mouse 430 2.0 Affymetrix GeneChip 39 expression array (Affymetrix, Santa Clara, CA). Array hybridization, washes, staining, and scanning procedures were carried out according to standard Affymetrix protocols. Expression data were normalized by the robust multiarray average (RMA) method and fitted to weighted linear models in R, using the affy and limma packages of Bioconductor, respectively [59,60]. Only probe sets with a variance over all samples superior to 0.1, a p-value inferior or equal to 0.05 after adjustment for multiplicity using the false discovery rate [61], and a minimum 2-fold difference in expression were selected for the analysis. The complete data set is available from the Gene Expression Omnibus database (accession number GSE17709). Microarray data were confirmed by real-time quantitative PCR analysis. 25–50 ng single-stranded cDNA was amplified in triplicate in a 384-well plate, using the 7900HT Fast Real Time PCR system (Applied Biosystems, Foster City, CA) and expression levels of selected genes was determined by SYBR Green or TaqMan assays (Applied Biosystems). PCR primers pairs and TaqMan probes used in this study are presented in Table S2.

Chromatin Immunoprecipitation Glomeruli were isolated from 6 PTIP+ and 6 PTIP2 kidneys by sieving as described above. Glomeruli were resuspended in 1 ml PBS and cross linked with 1% formaldehyde for 10 minutes with rocking at room temperature. Chromatin preparation, immunoprecipitation, and PCR analysis was essentially as described previously [13]. Primers pairs for the Ntrk3 locus were as follows: P1, 59- CAATGTATTTTGCTTCCTTGCC, 59- AAGAAAGG-

Figure 9. Ultrastructural Analysis of Ntrk3 Mutant Kidneys. Kidneys from Ntrk32/2 (9) and control littermates at 4 days of age were examined by scanning (A, B) and transmission electron microscopy (C, D). A, B) Note the disorganized patterning and irregularly shaped primary and secondary foot processes. C, D) Note the fusion of foot processes and the lack of well-spaced slit diaphragms in Ntrk3 mutants in D. Scale bars are 1 mm in A and B, 2 mm in C, and 500 nm in D. doi:10.1371/journal.pgen.1001142.g009

PLoS Genetics | www.plosgenetics.org

13

October 2010 | Volume 6 | Issue 10 | e1001142


H3K4 Methylation and Chronic Glomerular Disease

GTTAGGGGAATCCG; P2, 59- AACCCGTGCGTTTCGTAAGG, 59- GGAGGAAGGAGGAGAAGGAAGATG; P3, 59GCATCTTCCTTCTCCTCCTTCCTC, 59- AAGTCACCAAGTCCCACCTCCTAG; P4, 59- TTTGCCTTCCCACCGTCTGTTG, 59- TGCCTTTGAAACGCCGAAC.

Acknowledgments We thank L. Holzman for antibodies and critical discussion, A. Soofi for maintaining the mouse colony, C. Johnson and J. Washburn of the University of Michigan Cancer Center Micrarray core for the microarray analysis, D. Sorenson and C. Edwards for help with the electron microscopy and image analysis, and E. Hughes and T. Saunders of the University of Michigan transgenic Animal Core for help generating the Paxip1 floxed allele.

Supporting Information Table S1 Podocyte-specific genes that are unchanged after PTIP deletion. Found at: doi:10.1371/journal.pgen.1001142.s001 (0.03 MB DOC)

Author Contributions Conceived and designed the experiments: GML DK GRD. Performed the experiments: GML SRP DK LT GRD. Analyzed the data: GML SRP GRD. Contributed reagents/materials/analysis tools: LT. Wrote the paper: GRD.

Table S2 Quantitative RT-PCR primer sets and probes. Found at: doi:10.1371/journal.pgen.1001142.s002 (0.02 MB DOC)

References 25. Moeller MJ, Sanden SK, Soofi A, Wiggins RC, Holzman LB (2003) Podocytespecific expression of cre recombinase in transgenic mice. Genesis 35: 39–42. 26. Ho J, Ng KH, Rosen S, Dostal A, Gregory RI, et al. (2008) Podocyte-specific loss of functional microRNAs leads to rapid glomerular and tubular injury. J Am Soc Nephrol 19: 2069–2075. 27. Suleiman H, Heudobler D, Raschta AS, Zhao Y, Zhao Q, et al. (2007) The podocyte-specific inactivation of Lmx1b, Ldb1 and E2a yields new insight into a transcriptional network in podocytes. Dev Biol 304: 701–712. 28. El-Aouni C, Herbach N, Blattner SM, Henger A, Rastaldi MP, et al. (2006) Podocyte-specific deletion of integrin-linked kinase results in severe glomerular basement membrane alterations and progressive glomerulosclerosis. J Am Soc Nephrol 17: 1334–1344. 29. Wharram BL, Goyal M, Wiggins JE, Sanden SK, Hussain S, et al. (2005) Podocyte depletion causes glomerulosclerosis: diphtheria toxin-induced podocyte depletion in rats expressing human diphtheria toxin receptor transgene. J Am Soc Nephrol 16: 2941–2952. 30. Steger K, Pauls K, Klonisch T, Franke FE, Bergmann M (2000) Expression of protamine-1 and -2 mRNA during human spermiogenesis. Mol Hum Reprod 6: 219–225. 31. Cho C, Willis WD, Goulding EH, Jung-Ha H, Choi YC, et al. (2001) Haploinsufficiency of protamine-1 or -2 causes infertility in mice. Nat Genet 28: 82–86. 32. Genc B, Ozdinler PH, Mendoza AE, Erzurumlu RS (2004) A chemoattractant role for NT-3 in proprioceptive axon guidance. PLoS Biol 2: e403. doi:10.1371/ journal.pbio.0020403. 33. Donovan MJ, Hahn R, Tessarollo L, Hempstead BL (1996) Identification of an essential nonneuronal function of neurotrophin 3 in mammalian cardiac development. Nat Genet 14: 210–213. 34. Tessarollo L, Tsoulfas P, Donovan MJ, Palko ME, Blair-Flynn J, et al. (1997) Targeted deletion of all isoforms of the trkC gene suggests the use of alternate receptors by its ligand neurotrophin-3 in neuronal development and implicates trkC in normal cardiogenesis. Proc Natl Acad Sci U S A 94: 14776–14781. 35. Paves H, Saarma M (1997) Neurotrophins as in vitro growth cone guidance molecules for embryonic sensory neurons. Cell Tissue Res 290: 285–297. 36. Milne TA, Briggs SD, Brock HW, Martin ME, Gibbs D, et al. (2002) MLL targets SET domain methyltransferase activity to Hox gene promoters. Mol Cell 10: 1107–1117. 37. Glaser S, Schaft J, Lubitz S, Vintersten K, van der Hoeven F, et al. (2006) Multiple epigenetic maintenance factors implicated by the loss of Mll2 in mouse development. Development 133: 1423–1432. 38. Mu W, Wang W, Schimenti JC (2008) An allelic series uncovers novel roles of the BRCT domain-containing protein PTIP in mouse embryonic vascular development. Mol Cell Biol 28: 6439–6451. 39. Pabst R, Sterzel RB (1983) Cell renewal of glomerular cell types in normal rats. An autoradiographic analysis. Kidney Int 24: 626–631. 40. Appel D, Kershaw DB, Smeets B, Yuan G, Fuss A, et al. (2009) Recruitment of podocytes from glomerular parietal epithelial cells. J Am Soc Nephrol 20: 333–343. 41. Ronconi E, Sagrinati C, Angelotti ML, Lazzeri E, Mazzinghi B, et al. (2009) Regeneration of glomerular podocytes by human renal progenitors. J Am Soc Nephrol 20: 322–332. 42. Wykes SM, Krawetz SA (2003) The structural organization of sperm chromatin. J Biol Chem 278: 29471–29477. 43. Cuthbert GL, Daujat S, Snowden AW, Erdjument-Bromage H, Hagiwara T, et al. (2004) Histone deimination antagonizes arginine methylation. Cell 118: 545–553. 44. Faul C, Asanuma K, Yanagida-Asanuma E, Kim K, Mundel P (2007) Actin up: regulation of podocyte structure and function by components of the actin cytoskeleton. Trends Cell Biol 17: 428–437. 45. Patrakka J, Tryggvason K (2007) Nephrin–a unique structural and signaling protein of the kidney filter. Trends Mol Med 13: 396–403.

1. Campbell KH, McWhir J, Ritchie WA, Wilmut I (1996) Sheep cloned by nuclear transfer from a cultured cell line. Nature 380: 64–66. 2. Okita K, Ichisaka T, Yamanaka S (2007) Generation of germline-competent induced pluripotent stem cells. Nature 448: 313–317. 3. Wernig M, Meissner A, Foreman R, Brambrink T, Ku M, et al. (2007) In vitro reprogramming of fibroblasts into a pluripotent ES-cell-like state. Nature 448: 318–324. 4. Berger SL (2007) The complex language of chromatin regulation during transcription. Nature 447: 407–412. 5. Bernstein BE, Meissner A, Lander ES (2007) The mammalian epigenome. Cell 128: 669–681. 6. Ringrose L, Paro R (2004) Epigenetic regulation of cellular memory by the Polycomb and Trithorax group proteins. Annu Rev Genet 38: 413–443. 7. Schuettengruber B, Chourrout D, Vervoort M, Leblanc B, Cavalli G (2007) Genome regulation by polycomb and trithorax proteins. Cell 128: 735–745. 8. Ringrose L, Paro R (2007) Polycomb/Trithorax response elements and epigenetic memory of cell identity. Development 134: 223–232. 9. Schwartz YB, Pirrotta V (2007) Polycomb silencing mechanisms and the management of genomic programmes. Nat Rev Genet 8: 9–22. 10. Issaeva I, Zonis Y, Rozovskaia T, Orlovsky K, Croce CM, et al. (2007) Knockdown of ALR (MLL2) reveals ALR target genes and leads to alterations in cell adhesion and growth. Mol Cell Biol 27: 1889–1903. 11. Cho YW, Hong T, Hong S, Guo H, Yu H, et al. (2007) PTIP associates with MLL3- and MLL4-containing histone H3 lysine 4 methyltransferase complex. J Biol Chem. 12. Lee J, Saha PK, Yang QH, Lee S, Park JY, et al. (2008) Targeted inactivation of MLL3 histone H3-Lys-4 methyltransferase activity in the mouse reveals vital roles for MLL3 in adipogenesis. Proc Natl Acad Sci U S A 105: 19229–19234. 13. Patel SR, Kim D, Levitan I, Dressler GR (2007) The BRCT-Domain Containing Protein PTIP Links PAX2 to a Histone H3, Lysine 4 Methyltransferase Complex. Dev Cell 13: 580–592. 14. Cho EA, Prindle MJ, Dressler GR (2003) BRCT domain-containing protein PTIP is essential for progression through mitosis. Mol Cell Biol 23: 1666–1673. 15. Kim D, Patel SR, Xiao H, Dressler GR (2009) The role of PTIP in maintaining embryonic stem cell pluripotency. Stem Cells 27: 1516–1523. 16. Cho YW, Hong S, Jin Q, Wang L, Lee JE, et al. (2009) Histone methylation regulator PTIP is required for PPARgamma and C/EBPalpha expression and adipogenesis. Cell Metab 10: 27–39. 17. Fang M, Ren H, Liu J, Cadigan KM, Patel SR, et al. (2009) Drosophila ptip is essential for anterior/posterior patterning in development and interacts with the PcG and trxG pathways. Development 136: 1929–1938. 18. Hansen KH, Bracken AP, Pasini D, Dietrich N, Gehani SS, et al. (2008) A model for transmission of the H3K27me3 epigenetic mark. Nat Cell Biol 10: 1291–1300. 19. Margueron R, Justin N, Ohno K, Sharpe ML, Son J, et al. (2009) Role of the polycomb protein EED in the propagation of repressive histone marks. Nature 461: 762–767. 20. Blobel GA, Kadauke S, Wang E, Lau AW, Zuber J, et al. (2009) A reconfigured pattern of MLL occupancy within mitotic chromatin promotes rapid transcriptional reactivation following mitotic exit. Mol Cell 36: 970–983. 21. Chi P, Allis CD, Wang GG. Covalent histone modifications - miswritten, misinterpreted and mis-erased in human cancers. Nat Rev Cancer 10: 457–469. 22. Gluckman PD, Hanson MA, Buklijas T, Low FM, Beedle AS (2009) Epigenetic mechanisms that underpin metabolic and cardiovascular diseases. Nat Rev Endocrinol 5: 401–408. 23. Wiggins RC (2007) The spectrum of podocytopathies: a unifying view of glomerular diseases. Kidney Int 71: 1205–1214. 24. Moeller MJ, Sanden SK, Soofi A, Wiggins RC, Holzman LB (2002) Two Gene Fragments that Direct Podocyte-Specific Expression in Transgenic Mice. J Am Soc Nephrol 13: 1561–1567.

PLoS Genetics | www.plosgenetics.org

14

October 2010 | Volume 6 | Issue 10 | e1001142


H3K4 Methylation and Chronic Glomerular Disease

46. Castellani V, Bolz J (1999) Opposing roles for neurotrophin-3 in targeting and collateral formation of distinct sets of developing cortical neurons. Development 126: 3335–3345. 47. Tessarollo L, Coppola V, Fritzsch B (2004) NT-3 replacement with brainderived neurotrophic factor redirects vestibular nerve fibers to the cochlea. J Neurosci 24: 2575–2584. 48. Zhang HL, Eom T, Oleynikov Y, Shenoy SM, Liebelt DA, et al. (2001) Neurotrophin-induced transport of a beta-actin mRNP complex increases betaactin levels and stimulates growth cone motility. Neuron 31: 261–275. 49. Zhang HL, Singer RH, Bassell GJ (1999) Neurotrophin regulation of beta-actin mRNA and protein localization within growth cones. J Cell Biol 147: 59–70. 50. Rastaldi MP, Armelloni S, Berra S, Calvaresi N, Corbelli A, et al. (2006) Glomerular podocytes contain neuron-like functional synaptic vesicles. Faseb J 20: 976–978. 51. Kerjaschki D (1978) Polycation-induced dislocation of slit diaphragms and formation of cell junctions in rat kidney glomeruli: the effects of low temperature, divalent cations, colchicine, and cytochalasin B. Lab Invest 39: 430–440. 52. Kurihara H, Anderson JM, Kerjaschki D, Farquhar MG (1992) The altered glomerular filtration slits seen in puromycin aminonucleoside nephrosis and protamine sulfate-treated rats contain the tight junction protein ZO-1. Am J Pathol 141: 805–816. 53. Kim D, Wang M, Cai Q, Brooks H, Dressler GR (2007) Pax transactivationdomain interacting protein is required for urine concentration and osmotolerance in collecting duct epithelia. J Am Soc Nephrol 18: 1458–1465.

PLoS Genetics | www.plosgenetics.org

54. Lechner MS, Levitan I, Dressler GR (2000) PTIP, a novel BRCT domaincontaining protein interacts with Pax2 and is associated with active chromatin. Nucleic Acids Res 28: 2741–2751. 55. Verma R, Wharram B, Kovari I, Kunkel R, Nihalani D, et al. (2003) Fyn binds to and phosphorylates the kidney slit diaphragm component Nephrin. J Biol Chem 278: 20716–20723. 56. Soriano P (1999) Generalized lacZ expression with the ROSA26 Cre reporter strain. Nat Genet 21: 70–71. 57. Kim D, Dressler GR (2005) Nephrogenic factors promote differentiation of mouse embryonic stem cells into renal epithelia. J Am Soc Nephrol 16: 3527–3534. 58. Salant DJ, Darby C, Couser WG (1980) Experimental membranous glomerulonephritis in rats. Quantitative studies of glomerular immune deposit formation in isolated glomeruli and whole animals. J Clin Invest 66: 71–81. 59. Irizarry RA, Wu Z, Jaffee HA (2006) Comparison of Affymetrix GeneChip expression measures. Bioinformatics 22: 789–794. 60. Smyth GK (2004) Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol 3: Article3. 61. Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical approach to multiple testing. J Royal Stat Soc 57: 289–300.

15

October 2010 | Volume 6 | Issue 10 | e1001142


The IG-DMR and the MEG3-DMR at Human Chromosome 14q32.2: Hierarchical Interaction and Distinct Functional Properties as Imprinting Control Centers Masayo Kagami1, Maureen J. O’Sullivan2, Andrew J. Green3,4, Yoshiyuki Watabe5, Osamu Arisaka5, Nobuhide Masawa6, Kentarou Matsuoka7, Maki Fukami1, Keiko Matsubara1, Fumiko Kato1, Anne C. Ferguson-Smith8, Tsutomu Ogata1* 1 Department of Endocrinology and Metabolism, National Research Institute for Child Health and Development, Tokyo, Japan, 2 Department of Pathology, School of Medicine, Our Lady’s Children’s Hospital, Trinity College, Dublin, Ireland, 3 National Center for Medical Genetics, University College Dublin, Our Lady’s Hospital, Dublin, Ireland, 4 School of Medicine and Medical Science, University College, Dublin, Ireland, 5 Department of Pediatrics, Dokkyo University School of Medicine, Tochigi, Japan, 6 Department of Pathology, Dokkyo University School of Medicine, Tochigi, Japan, 7 Department of Pathology, National Center for Child Health and Development, Tokyo, Japan, 8 Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge, United Kingdom

Abstract Human chromosome 14q32.2 harbors the germline-derived primary DLK1-MEG3 intergenic differentially methylated region (IG-DMR) and the postfertilization-derived secondary MEG3-DMR, together with multiple imprinted genes. Although previous studies in cases with microdeletions and epimutations affecting both DMRs and paternal/maternal uniparental disomy 14-like phenotypes argue for a critical regulatory function of the two DMRs for the 14q32.2 imprinted region, the precise role of the individual DMR remains to be clarified. We studied an infant with upd(14)pat body and placental phenotypes and a heterozygous microdeletion involving the IG-DMR alone (patient 1) and a neonate with upd(14)pat body, but no placental phenotype and a heterozygous microdeletion involving the MEG3-DMR alone (patient 2). The results generated from the analysis of these two patients imply that the IG-DMR and the MEG3-DMR function as imprinting control centers in the placenta and the body, respectively, with a hierarchical interaction for the methylation pattern in the body governed by the IG-DMR. To our knowledge, this is the first study demonstrating an essential long-range imprinting regulatory function for the secondary DMR. Citation: Kagami M, O’Sullivan MJ, Green AJ, Watabe Y, Arisaka O, et al. (2010) The IG-DMR and the MEG3-DMR at Human Chromosome 14q32.2: Hierarchical Interaction and Distinct Functional Properties as Imprinting Control Centers. PLoS Genet 6(6): e1000992. doi:10.1371/journal.pgen.1000992 Editor: Wolf Reik, The Babraham Institute, United Kingdom Received December 29, 2009; Accepted May 19, 2010; Published June 17, 2010 Copyright: ß 2010 Kagami et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: This work was supported by grants from the Ministry of Health, Labor, and Welfare; from the Ministry of Education, Science, Sports and Culture; and from Takeda Science Foundation. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. * E-mail: tomogata@nch.go.jp

like phenotypes have revealed that epimutations (hypermethylation) and microdeletions affecting both DMRs of maternal origin cause paternalization of the 14q32.2 imprinted region, and that epimutations (hypomethylation) affecting both DMRs of paternal origin cause maternalization of the 14q32.2 imprinted region, while microdeletions involving the DMRs of paternal origin have no effect on the imprinting status [2,5–8]. These findings, together with the notion that parent-of-origin specific expression patterns of imprinted genes are primarily dependent on the methylation status of the DMRs [9], argue for a critical regulatory function of the two DMRs for the 14q32.2 imprinted region, with possible different effects between the body and the placenta. However, the precise role of individual DMR remains to be clarified. Here, we report that the IG-DMR and the MEG3-DMR show a hierarchical interaction for the methylation pattern in the body, and function as imprinting control centers in the placenta and the body, respectively. To our knowledge, this is the first study demonstrating not only different roles between the primary and secondary DMRs at a single imprinted region, but also an essential regulatory function for the secondary DMR.

Introduction Human chromosome 14q32.2 carries a cluster of protein-coding paternally expressed genes (PEGs) such as DLK1 and RTL1 and non-coding maternally expressed genes (MEGs) such as MEG3 (alias, GTL2), RTL1as (RTL1 antisense), MEG8, snoRNAs, and microRNAs [1,2]. Consistent with this, paternal uniparental disomy 14 (upd(14)pat) results in a unique phenotype characterized by facial abnormality, small bell-shaped thorax, abdominal wall defects, placentomegaly, and polyhydramnios [2,3], and maternal uniparental disomy 14 (upd(14)mat) leads to less-characteristic but clinically discernible features including growth failure [2,4]. The 14q32.2 imprinted region also harbors two differentially methylated regions (DMRs), i.e., the germline-derived primary DLK1-MEG3 intergenic DMR (IG-DMR) and the postfertilizationderived secondary MEG3-DMR [1,2]. Both DMRs are hypermethylated after paternal transmission and hypomethylated after maternal transmission in the body, whereas in the placenta the IG-DMR alone remains as a DMR and the MEG3-DMR is rather hypomethylated [1,2]. Furthermore, previous studies in cases with upd(14)pat/matPLoS Genetics | www.plosgenetics.org

1

June 2010 | Volume 6 | Issue 6 | e1000992


Imprinting Control Centers at Human 14q32.2

Author Summary Genomic imprinting is a process causing genes to be expressed in a parent-of-origin specific manner—some imprinted genes are expressed from maternally inherited chromosomes and others from paternally inherited chromosomes. Imprinted genes are often located in clusters regulated by regions that are differentially methylated according to their parental origin. The human chromosome 14q32.2 imprinted region harbors the germlinederived primary DLK1-MEG3 intergenic differentially methylated region (IG-DMR) and the postfertilization-derived secondary MEG3-DMR, together with multiple imprinted genes. Perturbed dosage of these imprinted genes, for example in patients with paternal and maternal uniparental disomy 14, causes distinct phenotypes. Here, through analysis of patients with microdeletions recapitulating some or all of the uniparental disomy 14 phenotypes, we show that the IG-DMR acts as an upstream regulator for the methylation pattern of the MEG3-DMR in the body but not in the placenta. Importantly, in the body, the MEG3DMR functions as an imprinting control center. To our knowledge, this is the first study demonstrating an essential function for the secondary DMR in the regulation of multiple imprinted genes. Thus, the results provide a significant advance in the clarification of underlying epigenetic features that can act to regulate imprinting.

Results Clinical reports We studied an infant with upd(14)pat body and placental phenotypes (patient 1) and a neonate with upd(14)pat body, but no placental, phenotype (patient 2) (Figure 1). Detailed clinical features of patients 1 and 2 are shown in Table 1. In brief, patient 1 was delivered by a caesarean section at 33 weeks of gestation due to progressive polyhydramnios despite amnioreduction at 28 and 30 weeks of gestation, whereas patient 2 was born at 28 weeks of gestation by a vaginal delivery due to progressive labor without discernible polyhydramnios. Placentomegaly was observed in patient 1 but not in patient 2. Patients 1 and 2 were found to have characteristic face, small bell-shaped thorax with coat hanger appearance of the ribs, and omphalocele. Patient 1 received surgical treatment for omphalocele immediately after birth and mechanical ventilation for several months. At present, she is 5.5 months of age, and still requires intensive care including oxygen administration and tube feeding. Patient 2 died at four days of age due to massive intracranial hemorrhage, while receiving intensive care including mechanical ventilation. The mother of patient 1 had several non-specific clinical features such as short stature and obesity. The father of patient 1 and the parents of patient 2 were clinically normal.

Figure 1. Clinical phenotypes of patients 1 and 2 at birth. Both patients have bell shaped thorax with coat hanger appearance of the ribs and omphalocele. In patient 1, histological examination of the placenta shows proliferation of dilated and congested chorionic villi, as has previously been observed in a case with upd(14)pat [2]. For comparison, the histological finding of a gestational age matched (33 weeks) control placenta is shown in a dashed square. The horizontal black bars indicate 100 mm. doi:10.1371/journal.pgen.1000992.g001

fibroblasts, and placenta at 38 weeks of gestation, and from fresh leukocytes of upd(14)pat/mat patients and formalin-fixed and paraffin-embedded placenta of a upd(14)pat patient [2,3].

Structural analysis of the imprinted region We first examined the structure of the 14q32.2 imprinted region (Figure 2). Upd(14) was excluded in patients 1 and 2 as well as in the mother of patient 1 by microsatellite analysis (Table S1), and FISH analysis for the two DMRs identified a familial heterozygous deletion encompassing the IG-DMR alone in patient 1 and her mother and a de novo heterozygous deletion encompassing the MEG3-DMR alone in patient 2 (Figure 2). The microdeletions were further localized by SNP genotyping for 70 loci (Table S1) and quantitative real-time PCR (q-PCR) analysis for four regions around the DMRs (Figure S1A), and serial direct sequencing for the long PCR products harboring the deletion junctions successfully identified the fusion points of the microdeletions in patient 1 and her mother and in patient 2 (Figure 2). According to the NT_026437 sequence data at the NCBI Database (Genome Build 36.3) (http://preview.ncbi.nlm.nih.gov/guide/), the deletion

Sample preparation We isolated genomic DNA (gDNA) and transcripts (mRNAs, snoRNAs, and microRNAs) from fresh leukocytes of patients 1 and the parents of patients 1 and 2, from fresh skin fibroblasts of patient 2, and from formalin-fixed and paraffin-embedded placental samples of patient 1 and similarly treated pituitary and adrenal samples of patient 2 (although multiple body tissues were available in patient 2, useful gDNA and transcript samples were not obtained from other tissues probably due to drastic postmortem degradation). We also made metaphase spreads from leukocytes and skin fibroblasts. For comparison, we obtained control samples from fresh normal adult leukocytes, neonatal skin PLoS Genetics | www.plosgenetics.org

2

June 2010 | Volume 6 | Issue 6 | e1000992


Imprinting Control Centers at Human 14q32.2

Table 1. Clinical features in patients 1 and 2.

Patient 1

Patient 2

Upd(14)pat (n = 20)c

Present age

5.5 months

Deceased at 4 days

0–9 years

Sex

Female

Female

Male:Female = 9:11

Karyotype

46,XX

46,XX

Gestational age (weeks)

33

28

28–37

Delivery

Caesarean

Vaginal

Vaginal:Caesarean = 6:7

Pregnancy and delivery

Polyhydramnios

Yes

No

20/20 (,28)d

Amnioreduction (weeks)

26 (28, 30)

No

6/6

Placentomegaly

Yes

No

10/10

Prenatal growth failure

No

No

1/13

Birth length (cm)

43 (WNR)a

34 (WNR)a

Birth weight (kg)

2.84 (.90 centile)a

1.32 (WNR)a

Postnatal growth failure

Yes

Present stature (cm)

56.3 (23.0 SD)b

Present weight (kg)

5.02 (23.0 SD)b

Frontal bossing

No

Yes

5/7

Hairy forehead

Yes

Yes

9/10

Blepharophimosis

Yes

No

14/15

Depressed nasal bridge

Yes

Yes

13/13

Anteverted nares

Yes

No

6/10

Small ears

Yes

Yes

11/12

Protruding philtrum

Yes

No

15/15

Growth pattern

5/6

Characteristic face

Puckered lips

No

No

3/10

Micrognathia

Yes

Yes

11/12

Thoracic abnormality Bell-shaped thorax

Yes

Yes

17/17

Mechanical ventilation

Yes

Yes

17/17

Abdominal wall defect Diastasis recti

15/17

Omphalocele

Yes

Yes

2/17e

Others Short webbed neck

Yes

Yes

14/14

Cardiac disease

No

Yes (PDA)

5/10

Inguinal hernia

No

No

2/6

Coxa valga

Yes

No

3/4

Joint contractures

Yes

No

8/10

Kyphoscoliosis

No

No

4/7

Extra features

Hydronephrosis (bilateral)

WNR: within the normal range; SD: standard deviation; and PDA: patent ductus arteriosus. a Assessed by the gestational age- and sex-matched Japanese reference data from the Ministry of Health, Labor, and Welfare (http://www.e-stat.go.jp/SG1/estat/ GL02020101.do). b Assessed by the age- and sex-matched Japanese reference data.. c In the column summarizing the clinical features of 20 patients with upd(14)pat, the denominators indicate the number of cases examined for the presence or absence of each feature, and the numerators represent the number of cases assessed to be positive for that feature; thus, the differences between the denominators and the numerators denote the number of cases evaluated to be negative for that feature (adopted from reference [2]). d Polyhydramnios has been identified by 28 weeks of gestation. e Omphalocele is present in two cases with upd(14)pat and in two cases with epimutations [2]. doi:10.1371/journal.pgen.1000992.t001

PLoS Genetics | www.plosgenetics.org

3

June 2010 | Volume 6 | Issue 6 | e1000992


Imprinting Control Centers at Human 14q32.2

Figure 2. Physical map of the 14q32.2 imprinted region and the deleted segments in patient 1 and her mother and in patient 2 (shaded in gray). PEGs are shown in blue, MEGs in red, and the IG-DMR (CG4 and CG6) and the MEG3-DMR (CG7) in green. It remains to be clarified whether DIO3 is a PEG, although mouse Dio3 is known to be preferentially but not exclusively expressed from a paternally derived chromosome [35]. For MEG3, the isoform 2 with nine exons (red bars) and eight introns (light red segment) is shown (Ensembl; http://www.ensembl.org/index.html). Electrochromatograms represent the fusion point in patient 1 and her mother, and the fusion point accompanied by insertion of a 66 bp segment (highlighted in blue) with a sequence identical to that within MEG3 intron 5 (the blue bar) in patient 2. Since PCR amplification with primers flanking the 66 bp segment at MEG3 intron 5 has produced a 194 bp single band in patient 2 as well as in a control subject (shown in the box), this indicates that the 66 bp segment at the fusion point is caused by a duplicated insertion rather than by a transfer from intron 5 to the fusion point (if the 66 bp is transferred from the original position, a 128 bp band as well as a 194 bp band should be present in patient 2) (the marker size: 100, 200, and 300 bp). In the FISH images, the red signals (arrows) have been identified by the FISH-1 probe and the FISH-2 probe, and the light green signals (arrowheads) by the RP11-566I2 probe for 14q12 used as an internal control. The faint signal detected by the FISH-2 probe in patient 2 is consistent with the preservation of a ,1.2 kb region identified by the centromeric portion of the FISH-2 probe. doi:10.1371/journal.pgen.1000992.g002

PLoS Genetics | www.plosgenetics.org

4

June 2010 | Volume 6 | Issue 6 | e1000992


Imprinting Control Centers at Human 14q32.2

Figure 3. Methylation analysis of the IG-DMR (CG4 and CG6) and the MEG3-DMR (CG7). Filled and open circles indicate methylated and unmethylated cytosines at the CpG dinucleotides, respectively. (A) Structure of CG4, CG6, and CG7. Pat: paternally derived chromosome; and Mat:

PLoS Genetics | www.plosgenetics.org

5

June 2010 | Volume 6 | Issue 6 | e1000992


Imprinting Control Centers at Human 14q32.2

maternally derived chromosome. The PCR products for CG4 (311 bp) harbor 6 CpG dinucleotides and a G/A SNP (rs12437020), and are digested with BstUI into three fragment (33 bp, 18 bp, and 260 bp) when the cytosines at the first and the second CpG dinucleotides and the fourth and the fifth CpG dinucleotides (indicated with orange rectangles) are methylated. The PCR products for CG6 (428 bp) carry 19 CpG dinucleotides and a C/T SNP (rs10133627), and are digested with TaqI into two fragment (189 bp and 239 bp) when the cytosine at the 9th CpG dinucleotide (indicated with an orange rectangle) is methylated. The PCR products for CG7 harbor 7 CpG dinucleotides, and are digested with BstUI into two fragment (56 bp and 112 bp) when the cytosines at the fourth and the fifth CpG dinucleotides (indicated with orange rectangles) are methylated. These enzymes have been utilized for combined bisulfite restriction analysis (COBRA). (B) Methylation analysis. Upper part shows bisulfite sequencing data. The SNP typing data are also denoted for CG4 and CG6. The circles highlighted in orange correspond to those shown in Figure 3A. The relatively long CG6 was not amplified from the formalin-fixed and paraffin-embedded placental samples, probably because of the degradation of genomic DNA. Note that CG4 is differentially methylated in a control placenta and is massively hypermethylated in a upd(14)pat placenta, whereas CG7 is rather hypomethylated in a upd(14)pat placenta as well as in a control placenta. Lower part shows COBRA data. U: unmethylated clone specific bands (311 bp for CG4, 428 bp for CG6, and 168 bp for CG7); and M: methylated clone specific bands (260 bp for CG4, 239 bp and 189 bp for CG6, and 112 bp and 56 bp for CG7). The results reproduce the bisulfite sequencing data, and delineate normal findings of the father of patient 1 and the parents of patient 2. doi:10.1371/journal.pgen.1000992.g003

SNORD114-29 in a control subject and the mother of patient 1 but not in patient 1. For skin fibroblasts, although all MEGs but no PEGs were expressed in control subjects, neither MEGs nor PEGs were expressed in patient 2. For placentas, although all imprinted genes were expressed in control subjects, PEGs only were expressed in patient 1. For the pituitary and adrenal of patient 2, DLK1 expression alone was identified. Expression pattern analyses using informative cSNPs revealed monoallelic MEG3 expression in the leukocytes of the mother of patient 1 (Figure 5D), and biparental RTL1 expression in the placenta of patient 1 (no informative cSNP was detected for DLK1) and biparental DLK1 expression in the pituitary and adrenal of patient 2 (RTL1 was not expressed in the pituitary and adrenal) (Figure 5E), as well as maternal MEG3 expression in the control leukocytes and paternal RTL1 expression in the control placentas (Figure S2). Although we also attempted q-PCR analysis, precise assessment was impossible for MEG3 in the mother of patient 1 because of faint expression level in leukocytes and for RTL1 in patient 1 and DLK1 in patient 2 because of poor quality of mRNAs obtained from formalin-fixed and paraffin-embedded tissues.

size was 8,558 bp (82,270,449–82,279,006 bp) for the microdeletion in patient 1 and her mother, and 4,303 bp (82,290,978– 82,295,280 bp) for the microdeletion in patient 2. The microdeletion in patient 2 also involved the 59 part of MEG3 and five of the seven putative CTCF binding sites A–G [10], and was accompanied by insertion of a 66 bp sequence duplicated from MEG3 intron 5 (82,299,727–82,299,792 bp on NT_026437). Direct sequencing of the exonic or transcribed regions detected no mutation in DLK1, MEG3, and RTL1, although several cDNA polymorphisms (cSNPs) were identified (Table S1). Oligoarray comparative genomic hybridization identified no other discernible structural abnormality (Figure S1B).

Methylation analysis of the two DMRs and the seven putative CTCF binding sites We next studied methylation patterns of the previously reported IG-DMR (CG4 and CG6) and MEG3-DMR (CG7) (Figure 3A) [2], using bisulfite treated gDNA samples. Bisulfite sequencing and combined bisulfite restriction analysis using body samples revealed a hypermethylated IG-DMR and MEG3-DMR in patient 1, a hypomethylated IG-DMR and differentially methylated MEG3DMR in the mother of patient 1, and a differentially methylated IG-DMR and hypermethylated MEG3-DMR in patient 2, and bisulfite sequencing using placental samples showed a hypermethylated IG-DMR and rather hypomethylated MEG3-DMR in patient 1 (Figure 3B). We also examined methylation patterns of the seven putative CTCF binding sites by bisulfite sequencing (Figure 4A). The sites C and D alone exhibited DMRs in the body and were rather hypomethylated in the placenta (Figure 4B), as observed in CG7. Furthermore, to identify an informative SNP(s) pattern for allelespecific bisulfite sequencing, we examined a 349 bp region encompassing the site C and a 356 bp region encompassing the site D as well as a 300 bp region spanning the previously reported three SNPs near the site D, in 120 control subjects, the cases with upd(14)pat/mat, and patients 1 and 2 and their parents. Consequently, an informative polymorphism was identified for a novel G/A SNP near the site D in only a single control subject, and the parent-of-origin specific methylation pattern was confirmed (Figure 4C). No informative SNP was found in the examined region around the site C, and no other informative SNP was identified in the two examined regions around the site D, with the previously known three SNPs being present in a homozygous condition in all the subjects analyzed.

Discussion The data of the present study are summarized in Figure 6. Parental origin of the microdeletion positive chromosomes is based on the methylation patterns of the preserved DMRs in patients 1 and 2 and the mother of patient 1 as well as maternal transmission in patient 1. Loss of the hypomethylated IG-DMR of maternal origin in patient 1 was associated with epimutation (hypermethylation) of the MEG3-DMR in the body and caused paternalization of the imprinted region and typical upd(14)pat body and placental phenotypes, whereas loss of the hypomethylated MEG3-DMR of maternal origin in patient 2 permitted normal methylation pattern of the IG-DMR in the body and resulted in maternal to paternal epigenotypic alteration and typical upd(14)pat body, but no placental, phenotype. In this regard, while a 66 bp segment was inserted in patient 2, this segment contains no known regulatory sequence [11] or evolutionarily conserved element [12] (also examined with a VISTA program, http://genome.lbl.gov/vista/ index.shtml). Similarly, while no control samples were available for pituitary and adrenal, the previous study in human subjects has shown paternal DLK1 expression in adrenal as well as monoallelic DLK1 and MEG3 expressions in various tissues [11]. Furthermore, the present and the previous studies [2] indicate that this region is imprinted in the placenta as well as in the body. Thus, these results, in conjunction with the finding that the IG-DMR remains as a DMR and the MEG3-DMR exhibits a non-DMR in the placenta [2], imply the following: (1) the IG-DMR functions hierarchically as an upstream regulator for the methylation pattern of the MEG3-DMR on the maternally inherited chromosome in the body, but not in the placenta; (2) the hypomethylated

Expression analysis of the imprinted genes Finally, we performed expression analyses, using standard reverse transcriptase (RT)-PCR and/or q-PCR analysis for multiple imprinted genes in this region (Figure 5A–5C). For leukocytes, weak expression was detected for MEG3 and PLoS Genetics | www.plosgenetics.org

6

June 2010 | Volume 6 | Issue 6 | e1000992


Imprinting Control Centers at Human 14q32.2

PLoS Genetics | www.plosgenetics.org

7

June 2010 | Volume 6 | Issue 6 | e1000992


Imprinting Control Centers at Human 14q32.2

Figure 4. Methylation analysis of the putative CTCF protein binding sites A–G. (A) Location and sequence of the putative CTCF binding sites. In the left part, the sites C and D are painted in yellow and the remaining sites in purple. In the right part, the consensus CTCF binding motifs are shown in red letters; the cytosine residues at the CpG dinucleotides within the CTCF binding motifs are highlighted in blue, and those outside the CTCF binding motifs are highlighted in green [10]. (B) Methylation analysis. Upper part shows bisulfite sequencing data, using leukocyte genomic DNA samples. Since PCR products for the site B contain a C/A SNP (rs11627993), genotyping data are also indicated. The circles highlighted in blue correspond to those shown in Figure 4A. The sites C and D exhibit clear DMRs. Lower part indicates the results of the sites C and D using leukocyte and/or placental genomic DNA samples. The findings are similar to those of CG7. (C) Allele-specific methylation pattern of the CTCF binding site D. A novel G/A SNP has been identified in a single control subject, as shown on a reverse chromatogram delineating a C/T SNP pattern, while the previously reported three SNPs were present in a homozygous condition. Methylated and unmethylated clones are associated with the ‘‘G’’ and the ‘‘A’’ alleles, respectively. doi:10.1371/journal.pgen.1000992.g004

MEG3-DMR functions as an essential imprinting regulator for both PEGs and MEGs in the body; and (3) in the placenta, the hypomethylated IG-DMR directly controls the imprinting pattern of both PEGs and MEGs. These notions also explain the epigenotypic alteration in the previous cases with epimutations or microdeletions affecting both DMRs (Figure S3). It remains to be clarified how the IG-DMR and the MEG3DMR interact hierarchically in the body. However, the present data, together with the previous findings in cases with epimutations [2,5–8], imply that MEG3-DMR can remain hypomethylated only in the presence of a hypomethylated IG-DMR and is methylated when the IG-DMR is deleted or methylated irrespective of the parental origin. Furthermore, mouse studies have suggested that the methylation pattern of the postfertilization-derived Gtl2-DMR (the mouse homolog for the MEG3-DMR) is dependent on that of the germline-derive IG-DMR [13]. Thus, a preferential binding of some factor(s) to the unmethylated IGDMR may cause a conformational alteration of the genomic structure, thereby protecting the methylation of the MEG3-DMR. It also remains to be elucidated how the IG-DMR and the MEG3-DMR regulate the expression of both PEGs and MEGs in the placenta and the body, respectively. For the MEG3-DMR, however, the CTCF binding sites C and D may play a pivotal role in the imprinting regulation. The methylation analysis indicates that the two sites reside within the MEG3-DMR, and it is known that the CTCF protein with versatile functions preferentially binds to unmethylated target sequences including the sites C and D [10,14–16]. In this regard, all the MEGs in this imprinted region can be transcribed together in the same orientation and show a strikingly similar tissue expressions pattern [1,12], whereas PEGs are transcribed in different directions and are co-expressed with MEGs only in limited cell-types [1,17]. It is possible, therefore, that preferential CTCF binding to the grossly unmethylated sites C and D activates all the MEGs as a large transcription unit and represses all the PEGs perhaps by influencing chromatin structure and histone modification independently of the effects of expressed MEGs. In support of this, CTCF protein acts as a transcriptional activator for Gtl2 (the mouse homolog for MEG3) in the mouse [18]. Such an imprinting control model has not been proposed previously. It is different from the CTCF protein-mediated insulator model indicated for the H19-DMR and from the noncoding RNA-mediated model implicated for several imprinted regions including the KvDMR1 [19]. However, the KvDMR1 harbors two putative CTCF binding sites that may mediate noncoding RNA independent imprinting regulation [20], and the imprinting control center for Prader-Willi syndrome [21] also carries three CTCF binding sites (examined with a Search for CTCF DNA Binding Sites program, http://www.essex.ac.uk/bs/ molonc/spa.html). Thus, while each imprinted region would be regulated by a different mechanism, a CTCF protein may be involved in the imprinting control of multiple regions, in various manners. PLoS Genetics | www.plosgenetics.org

This imprinted region has also been studied in the mouse. Clinical and molecular findings in wildtype mice [1,22,23], mice with PatDi(12) (paternal disomy for chromosome 12 harboring this imprinted region) [13,24,25], and mice with targeted deletions for the IG-DMR (DIG-DMR) [22,26] and for the Gtl2-DMR (DGtl2DMR) [27] are summarized in Table 2. These data, together with human data, provide several informative findings. First, in both the human and the mouse, the IG-DMR is differentially methylated in both the body and the placenta, whereas the MEG3/Gtl2-DMR is differentially methylated in the body and exhibits non-DMR in the placenta. Second, the IG-DMR and the MEG3/Gtl2-DMR show a hierarchical interaction on the maternally derived chromosome in both the human and the mouse bodies. Indeed, the MEG3/Gtl2-DMR is epimutated in patient 1 and mice with maternally inherited DIG-DMR, and the IG-DMR is normally methylated in patient 2 and mice with maternally inherited DGtl2-DMR. Third, the function of the IGDMR is comparable between human and mouse bodies and different between human and mouse placentas. Indeed, patient 1 has upd(14)pat body and placental phenotypes, whereas mice with the DIG-DMR of maternal origin have PatDi(12)-compatible body phenotype and apparently normal placental phenotype. It is likely that imprinting regulation in the mouse placenta is contributed by some mechanism(s) other than the methylation pattern of the IGDMR, such as chromatin conformation [22,28,29]. Unfortunately, however, the data of DGtl2-DMR mice appears to be drastically complicated by the retained neomycin cassette in the upstream region of Gtl2. Indeed, it has been shown that the insertion of a lacZ gene or a neomycin gene in the similar upstream region of Gtl2 causes severely dysregulated expression patterns and abnormal phenotypes after both paternal and maternal transmissions [30,31], and that deletion of the inserted neomycin gene results in apparently normal expression patterns and phenotypes after both paternal and maternal transmissions [31]. (In this regard, although a possible influence of the inserted 66 bp segment can not be excluded formally in patient 2, phenotype and expression data in patient 2 are compatible with simple paternalization of the imprinted region.) In addition, since the apparently normal phenotype in mice homozygous for DGtl2DMR is reminiscent of that in sheep homozygous for the callipyge mutation [32], a complicated mechanism(s) such as the polar overdominance may be operating in the DGtl2-DMR mice [33]. Thus, it remains to be clarified whether the MEG3/Gtl2-DMR has a similar or different function between the human and the mouse. Two points should be made in reference to the present study. First, the proposed functions of the two DMRs are based on the results of single patients. This must be kept in mind, because there might be a hidden patient-specific abnormality or event that might explain the results. For example, the abnormal placental phenotype in patient 1 might be caused by some co-incidental aberration, and the apparently normal placenta in patient 2 might be due to mosaicism with grossly preserved MEG3-DMR in the placenta and grossly deleted MEG3-DMR in the body. Second, 8

June 2010 | Volume 6 | Issue 6 | e1000992


Imprinting Control Centers at Human 14q32.2

Figure 5. Expression analysis. (A) Reverse transcriptase (RT)-PCR analysis. L: leukocytes; SF: skin fibroblasts; and P: placenta. The relatively weak GAPDH expression for the formalin-fixed and paraffin-embedded placenta of patient 1 indicates considerable mRNA degradation. Since a single exon was amplified for DLK1 and RTL1, PCR was performed with and without RT for the placenta of patient 1, to exclude the possibility of false positive results caused by genomic DNA contamination. (B) Quantitative real-time PCR (q-PCR) analysis of MEG3, MEG8, and miRNAs, using fresh skin fibroblasts (SF) of patient 2 and four control neonates. Of the examined MEGs, miR433 and miR127 are encoded by RTL1as. (C) RT-PCR analysis for the formalin-fixed and paraffin-embedded pituitary (Pit.) and the adrenal (Ad.) in patient 2. The bands for DLK1 are detected in the presence of RT and undetected in the absence of RT, thereby excluding contamination of genomic DNA. (D) Monoallelic MEG3 expression in the leukocytes of the mother of patient 1. The three cSNPs are present in a heterozygous status in gDNA and in a hemizygous status in cDNA. D: direct sequence. (E) Biparental RTL1 expression in the placenta of patient 1 and biparental DLK1 expression in the pituitary and adrenal of patient 2. D: direct sequence; and S: subcloned sequence. In patient 1, genotyping of RTL1 cSNP (rs6575805) using gDNA indicates maternal origin of the ‘‘C’’ allele and paternal origin of the ‘‘T’’ allele, and sequencing analysis using cDNA confirms expression of maternally as well as paternally derived RTL1. Similarly, in patient 2, genotyping of DLK1 cSNP (rs1802710) using gDNA denotes maternal origin of the ‘‘C’’ allele and paternal origin of the ‘‘T’’ alleles, and sequencing analysis using cDNA confirms expression of maternally as well as paternally inherited DLK1. doi:10.1371/journal.pgen.1000992.g005

the clinical features in the mother of patient 1 such as short stature and obesity are often observed in cases with upd(14)mat (Table S2). However, the clinical features are non-specific and appear to be irrelevant to the microdeletion involving the IG-DMR, because loss of the paternally derived IG-DMR does not affect the imprinted status [2,26]. Indeed, MEG3 in the mother of patient 1 showed normal monoallelic expression in the presence of the differentially methylated MEG3-DMR. Nevertheless, since the upd(14)mat phenotype is primarily ascribed to loss of functional DLK1 (Figure S3B) [2,34], it might be possible that the PLoS Genetics | www.plosgenetics.org

microdeletion involving the IG-DMR has affected a cis-acting regulatory element for DLK1 expression (for details, see Note in the legend for Table S2). Further studies in cases with similar microdeletions will permit clarification of these two points. In summary, the results show a hierarchical interaction and distinct functional properties of the IG-DMR and the MEG3DMR in imprinting control. Thus, this study provides significant advance in the clarification of mechanisms involved in the imprinting regulation at the 14q32.2 imprinted region and the development of upd(14) phenotype. 9

June 2010 | Volume 6 | Issue 6 | e1000992


Imprinting Control Centers at Human 14q32.2

Figure 6. Schematic representation of the observed and predicted methylation and expression patterns. Deleted regions in patients 1 and 2 and the mother of patient 1 are indicated by stippled rectangles. P: paternally derived chromosome; and M: maternally derived chromosome. Representative imprinted genes are shown; these genes are known to be imprinted in the body and the placenta [2] (see also Figure S2). Placental samples have not been obtained in patient 2 and the mother of patient 1 (highlighted with light green backgrounds). Thick arrows for RTL1 in patients 1 and 2 represent increased RTL1 expression that is ascribed to loss of functional microRNA-containing RTL1as as a repressor for RTL1 [26,36– 38]; this phenomenon has been indicated in placentas with upd(14)pat and in those with an epimutation and a microdeletion involving the two DMRs (Figure S3A and S3C) [2]. MEG3 and RTL1as that are disrupted or predicted to have become silent on the maternally derived chromosome are written in gray. Filled and open circles represent hypermethylated and hypomethylated DMRs, respectively; since the MEG3-DMR is rather hypomethylated and regarded as non-DMR in the placenta [2] (see also Figure 3), it is painted in gray. doi:10.1371/journal.pgen.1000992.g006

Materials and Methods

Sample preparation For leukocytes and skin fibroblasts, genomic DNA (gDNA) samples were extracted with FlexiGene DNA Kit (Qiagen), and RNA samples were prepared with RNeasy Plus Mini (Qiagen) for DLK1, MEG3, RTL1, MEG8 and snoRNAs, and with mirVana miRNA Isolation Kit (Ambion) for microRNAs. For paraffinembedded tissues including the placenta, brain, lung, heart, liver, spleen, kidney, bladder, and small intestine, gDNA and RNA samples were extracted with RecoverAll Total Nucleic Acids Isolation Kit (Ambion) using slices of 40 mm thick. For fresh control placental samples, gDNA and RNA were extracted using ISOGEN (Nippon Gene). After treating total RNA samples with

Ethics statement This study was approved by the Institutional Review Board Committees at National Center for Child health and Development, University College Dublin, and Dokkyo University School of Medicine, and performed after obtaining written informed consent.

Primers All the primers utilized in this study are summarized in Table S3. PLoS Genetics | www.plosgenetics.org

10

June 2010 | Volume 6 | Issue 6 | e1000992


Imprinting Control Centers at Human 14q32.2

Table 2. Clinical and molecular findings in wild-type and PatDi(12) mice and mice with maternally inherited DIG-DMR and DGtl2DMR.

Wildtype

DIG-DMR (,4.15 kb)a

PatDi(12)

DGtl2-DMR (,10 kb)b Neomycin cassette (+)

,Body. Phenotype

Normal

Abnormalc

PatDi(12) phenotypec

Normal at birth Lethal by 4 weeks

Methylation pattern IG-DMR Gtl2-DMR

Methylated

Methylatedd

Differential

Methylated

Epimutated

e

Monoallelic

Increased (,2x)

Biparental

Differential

Differential Methylatedd

Expression pattern Pegs

Grossly normal

Increased (2x or 4.5x)f Monoallelic

Absent

Absent

Decreased (,0.2,0.5x)g

Normal

Placentomegaly

Apparently normal

Not determined

IG-DMR

Differential

Methylated

Not determined

Not determined

Gtl2-DMR

Non-DMR

Non-DMR

Not determined

Not determined

Monoallelic

Not determined

Increased (1.5,1.8x)g

Megs ,Placenta. Phenotype Methylation pattern

Expression pattern Pegs Megs

Monoallelic

Not determined

Decreased (0.6,0.8x)

Remark

Decreased (0.5,0.85x)g

g

Decreased (,0.1,1.0)g

h

Paternal transmissioni

Paternal transmission

Biparental transmissionj a The deletion size is smaller than that of patient 1 and her mother in this study, especially at the centromeric region. b The microdeletion also involves Gtl2, and the deletion size is larger than that of patient 2 in this study. c Body phenotype includes bell-shaped thorax with rib anomalies, distended abdomen, and short and broad neck. d Hemizygosity for the methylated DMR of paternal origin. e Hypermethylation of the maternally derived DMR. f 2x Dlk1 and Dio3 expression levels and 4.5x Rtl1 expression level. The markedly elevated Rtl1 expression level is ascribed to a synergic effect between activation of the usually silent Rtl1 of maternal origin and loss of functional microRNA-containing Rtl1as as a repressor for Rtl1 [26,36–38]. g The expression level is variable among examined tissues and examined genes. h The DIG-DMR of paternal origin has permitted normal Gtl2-DMR methylation pattern, intact imprinting status, and normal phenotype in the body (no data on the placenta). i The DGtl2-DMR of paternal origin is accompanied by normal methylation pattern of the IG-DMR and variably reduced Pegs expression and increased Megs expression in the body, and has yielded severe growth retardation accompanied by perinatal lethality. j The homozygous mutants have survived and developed into fertile adults, despite rather altered expression patterns of the imprinted genes. doi:10.1371/journal.pgen.1000992.t002

DNase, cDNA samples for DLK1, MEG3, MEG8, and snoRNAs were prepared with oligo(dT) primers from 1 mg of RNA using Superscript III Reverse Transcriptase (Invitrogen), and those for microRNAs were synthesized from 300 ng of RNA using TaqMan MicroRNA Reverse Transcription Kit (Applied Biosystems). For RTL1, cDNA samples were synthesized with RTL1-specific primers that do not amplify RTL1as. Control gDNA and cDNA samples were extracted from adult leukocytes and neonatal skin fibroblasts purchased from Takara Bio Inc. Japan, and from a fresh placenta of 38 weeks of gestation. Metaphase spreads were prepared from leukocytes and skin fibroblasts using colcemide (Invitrogen).

rhodamine anti-digoxigenin, and the RP11-566I2 probe was labeled with biotin and detected by avidin conjugated to fluorescein isothiocyanate. For quantitative real-time PCR analysis, the relative copy number to RNaseP (catalog No: 4316831, Applied Biosystems) was determined by the Taqman real-time PCR method using the probe-primer mix on an ABI PRISM 7000 (Applied Biosystems). To determine the breakpoints of microdeletions, sequence analysis was performed for long PCR products harboring the fusion points, using serial forward primers on the CEQ 8000 autosequencer (Beckman Coulter). Direct sequencing was also performed on the CEQ 8000 autosequencer. Oligoarray comparative genomic hybridization was performed with 16244K Human Genome Array (catalog No: G4411B) (Agilent Technologies), according to the manufacturer’s protocol.

Structural analysis Microsatellite analysis and SNP genotyping were performed as described previously [2]. For FISH analysis, metaphase spreads were hybridized with a 5,104 bp FISH-1 probe and a 5,182 bp FISH-2 probe produced by long PCR, together with an RP11566I2 probe for 14q12 used as an internal control [2]. The FISH-1 and FISH-2 probes were labeled with digoxigenin and detected by PLoS Genetics | www.plosgenetics.org

Methylation analysis Methylation analysis was performed for gDNA treated with bisulfite using the EZ DNA Methylation Kit (Zymo Research). After PCR amplification using primer sets that hybridize both methylated and unmethylated clones because of lack of CpG 11

June 2010 | Volume 6 | Issue 6 | e1000992


Imprinting Control Centers at Human 14q32.2

normal subject. Genotyping has been carried out for RTL1 cSNP using gDNA and cDNA samples of a fresh placenta and gDNA sample from the mother, showing that both maternally and nonmaternally (paternally) derived alleles are delineated in the gDNA, whereas a non-maternally (paternally) inherited allele alone is detected in cDNA. This cSNP has also been examined in the placenta of patient 1 (Figure 5E). Furthermore, the results confirm that the primers utilized in this study have amplified RTL1, but not RTL1as. Found at: doi:10.1371/journal.pgen.1000992.s002 (0.39 MB TIF)

dinucleotides within the primer sequences, the PCR products were digested with appropriate restriction enzymes for combined bisulfite restriction analysis. For bisulfite sequencing, the PCR products were subcloned with TOPO TA Cloning Kit (Invitrogen) and subjected to direct sequencing on the CEQ 8000 autosequencer.

Expression analysis Standard RT-PCR was performed for DLK1, RTL1, MEG3, MEG8, and snoRNAs using primers hybridizing to exonic or transcribed sequences, and one ml of PCR reaction solutions was loaded onto Gel-Dye Mix (Agilent). Taqman real-time PCR was carried out using the probe-primer mixtures (assay No: Hs00292028 for MEG3 and Hs00419701 for MEG8; assay ID: 001028 for miR433, 000452 for miR127, 000568 for miR379, and 000477 for miR154) on the ABI PRISM 7000. Data were normalized against GAPDH (catalog No: 4326317E) for MEG3 and MEG8 and against RNU48 (assay ID: 0010006) for the remaining miRs. The expression studies were performed three times for each sample. To examine the imprinting status of MEG3 in the leukocytes of the mother of patient 1, direct sequence data for informative cSNPs were compared between gDNA and cDNA. To analyze the imprinting status of RTL1 in the placental sample of patient 1 and that of DLK1 in the pituitary and adrenal samples of patient 2, RTPCR products containing exonic cSNPs informative for the parental origin were subcloned with TOPO TA Cloning Kit, and multiple clones were subjected to direct sequencing on the CEQ 8000 autosequencer. Furthermore, MEG3 expression pattern was examined using leukocyte gDNA and cDNA samples from multiple normal subjects and leukocyte gDNA samples from their mothers, and RTL1 expression pattern was analyzed using gDNA and cDNA samples from multiple fresh normal placentas and leukocyte gDNA from the mothers.

Figure S3 Schematic representation of the observed and predicted methylation and expression patterns in previously reported cases with upd(14)pat/mat-like phenotypes and in normal and upd(14)pat/mat subjects. For the explanations of the illustrations, see the legend for Figure 6. Previous studies have indicated that (1) Epimutation-1, Deletion-1, Deletion-2, and Deletion-3 lead to maternal to paternal epigenotypic alteration; (2) Epimutation-2 results in paternal to maternal epigenotypic alteration; and (3) Deletion-4 and Deletion-5 have no effect on the epigenotypic status [2,5–8,26]. (A) Cases with typical or mild upd(14)pat phenotype. Epimutation-1: Hypermethylation of the IG-DMR and the MEG3-DMR of maternal origin in the body, and that of the IG-DMR of maternal origin in the placenta (the MEG3-DMR is rather hypomethylated in the placenta) (cases 6–8 in Kagami et al. [2]). Deletion-1: Microdeletion involving DLK1, the two DMRs, and MEG3 on the maternally inherited chromosome (case 2 in Kagami et al. [2]). Deletion-2: Microdeletion involving DLK1, the two DMRs, MEG3, RTL1, and RTL1as on the maternally inherited chromosome (cases 3 and 5 in Kagami et al. [2]). Deletion-3: Microdeletion involving the two DMRs, MEG3, RTL1, and RTL1as on the maternally inherited chromosome (case 4 in Kagami et al. [2]). These findings are explained by the following notions: (1) Epimutation (hypermethylation) of the normally hypomethylated IG-DMR of maternal origin directly results in paternalization of the imprinted region in the placenta and indirectly leads to paternalization of the imprinted region in the body via epimutation (hypermethylation) of the usually hypomethylated MEG3-DMR of maternal origin. Thus, the epimutation (hypermethylation) is predicted to have impaired the IG-DMR as the primary target, followed by the epimutation (hypermethylation) of the MEG3-DMR after fertilization; (2) Loss of the hypomethylated MEG3-DMR of maternal origin leads to paternalization of the imprinted region in the body; and (3) Loss of the hypomethylated IG-DMR of maternal origin results in paternalization of the imprinted region in the placenta. Furthermore, epigenotype-phenotype correlations imply that the severity of upd(14)pat phenotype is primarily determined by the RTL1 expression dosage rather than the DLK1 expression dosage [2]. (B) Cases with upd(14)mat-like phenotype. Epimutation-2: Hypomethylation of the IG-DMR and the MEG3-DMR of paternal origin (Temple et al. [5], Buiting et al. [6], Hosoki et al. [7], and Zechner et al. [8]). Deletion-4: Microdeletion involving DLK1, the two DMRs, and MEG3 on the paternally inherited chromosome (cases 9 and 10 in Kagami et al. [2]). Deletion-5: Microdeletion involving DLK1, the two DMRs, MEG3, RTL1, and RTL1as on the paternally inherited chromosome (case 11 in Kagami et al. [2] and patient 3 in Buiting et al. [6]). These findings are consistent with the following notions: (1) Epimutation (hypomethylation) of the normally hypermethylated IG-DMR of paternal origin directly results in maternalization of the imprinted region in the placenta and indirectly leads to maternalization of the imprinted region in the body through epimutation (hypomethylation) of the usually hypermethylated MEG3-DMR of paternal origin. Thus, epimutation (hypomethylation) is predicted to have affected the IG-DMR

Supporting Information Figure S1 Structural analysis. (A) Quantitative real-time PCR analysis (q-PCR) for four regions (q-PCR-1-4) in patient 2. The qPCR-1 and q-PCR-2 regions are present in two copies whereas qPCR-3 and q-PCR-4 regions are present in a single copy in patient 2. The four regions are present in two copies in the parents and a control subject, in a single copy in the two previously reported patients with microdeletions involving the examined regions (Deletion-1 and Deletion-2 are case 2 and case 3 in Kagami et al. [2], respectively), and in three copies in a hitherto unreported case with 46,XX,der(17)t(14;17)(q32.2;p13)pat who have three copies of the 14q32.2 imprinted region. Since the microsatellite locus D14S985 is present in two copies (Table S1) and the MEG3DMR is deleted (Figure 2) in patient 2, this has served to localize the breakpoints. (B) Oligoarray comparative genomic hybridization for a ,1 Mb imprinted region. All the signals remain within the normal range (-1 SD , +1 SD) (shaded in light blue) in patients 1 and 2. Found at: doi:10.1371/journal.pgen.1000992.s001 (1.17 MB TIF) Figure S2 Expression analysis. (A) Maternal MEG3 expression in the leukocytes of normal subjects. Genotyping has been performed for three cSNPs using genomic DNA (gDNA) and cDNA of leukocytes from control subjects and gDNA samples of their mothers, indicating that both maternally and non-maternally (paternally) derived alleles are delineated in the gDNA, whereas maternally inherited alleles alone are identified in cDNA. These three cSNPs have also been studied in the mother of patient 1 (Figure 5D). (B) Paternal RTL1 expression in the placenta of a PLoS Genetics | www.plosgenetics.org

12

June 2010 | Volume 6 | Issue 6 | e1000992


Imprinting Control Centers at Human 14q32.2

Found at: doi:10.1371/journal.pgen.1000992.s004 (0.19 MB DOC)

as the primary target, followed by the epimutation (hypomethylation) of the MEG3-DMR after fertilization; and (2) Loss of the hypermethylated DMRs of paternal origin has no effect on the imprinting status [2,26], so that upd(14)mat-like phenotype is primarily ascribed to the additive effects of loss of functional DLK1 and RTL1 from the paternally derived chromosome (the effects of loss of DIO3 appears to be minor, if any [2,35]). Although the MEGs expression dosage is predicted to be normal in Deletion-4 and Deletion-5 and doubled in Epimutation-2 as well as in upd(14)mat, it remains to be determined whether the difference in the MEGs expression dosage has major clinical effects or not. (C) Normal and upd(14)pat/mat subjects. Found at: doi:10.1371/journal.pgen.1000992.s003 (2.72 MB TIF)

Table S2 Clinical features in the mother of patient 1. Found at: doi:10.1371/journal.pgen.1000992.s005 (0.09 MB DOC) Table S3 Primers utilized in the present study. Found at: doi:10.1371/journal.pgen.1000992.s006 (0.14 MB DOC)

Author Contributions Conceived and designed the experiments: MK ACFS TO. Performed the experiments: MK MF KM FK. Analyzed the data: MK TO. Contributed reagents/materials/analysis tools: MJO AJG YW OA NM KM TO. Wrote the paper: TO.

Table S1 The results of microsatellite and SNP analyses.

References 19. Ideraabdullah FY, Vigneau S, Bartolomei MS (2008) Genomic imprinting mechanisms in mammals. Mutat Res 647: 77–85. 20. Fitzpatrick GV, Pugacheva EM, Shin JY, Abdullaev Z, Yang Y, et al. (2007) Allele-specific binding of CTCF to the multipartite imprinting control region KvDMR1. Mol Cell Biol 27: 2636–2647. 21. Horsthemke B, Wagstaff J (2008) Mechanisms of imprinting of the Prader-Willi/ Angelman region. Am J Med Genet A 146A: 2041–2052. 22. Lin SP, Coan P, da Rocha ST, Seitz H, Cavaille J, et al. (2007) Differential regulation of imprinting in the murine embryo and placenta by the Dlk1-Dio3 imprinting control region. Development 134: 417–426. 23. Coan PM, Burton GJ, Ferguson-Smith AC (2005) Imprinted genes in the placenta–a review. Placenta 26 Suppl A: S10–20. 24. Georgiades P, Watkins M, Surani MA, Ferguson-Smith AC (2000) Parental origin-specific developmental defects in mice with uniparental disomy for chromosome 12. Development 127: 4719–4728. 25. Takada S, Tevendale M, Baker J, Georgiades P, Campbell E, et al. (2000) Deltalike and gtl2 are reciprocally expressed, differentially methylated linked imprinted genes on mouse chromosome 12. Curr Biol 10: 1135–1138. 26. Lin SP, Youngson N, Takada S, Seitz H, Reik W, et al. (2003) Asymmetric regulation of imprinting on the maternal and paternal chromosomes at the Dlk1Gtl2 imprinted cluster on mouse chromosome 12. Nat Genet 35: 97–102. 27. Takahashi N, Okamoto A, Kobayashi R, Shirai M, Obata Y, et al. (2009) Deletion of Gtl2, imprinted non-coding RNA, with its differentially methylated region induces lethal parent-origin-dependent defects in mice. Hum Mol Genet 18: 1879–1888. 28. Lewis A, Mitsuya K, Umlauf D, Smith P, Dean W, et al. (2004) Imprinting on distal chromosome 7 in the placenta involves repressive histone methylation independent of DNA methylation. Nat Genet 36: 1291–1295. 29. Umlauf D, Goto Y, Cao R, Cerqueira F, Wagschal A, et al. (2004) Imprinting along the Kcnq1 domain on mouse chromosome 7 involves repressive histone methylation and recruitment of Polycomb group complexes. Nat Genet 36: 1296–1300. 30. Sekita Y, Wagatsuma H, Irie M, Kobayashi S, Kohda T, et al. (2006) Aberrant regulation of imprinted gene expression in Gtl2lacZ mice. Cytogenet. Genome Res 113: 223–229. 31. Steshina EY, Carr MS, Glick EA, Yevtodiyenko A, Appelbe OK, et al. (2006) Loss of imprinting at the Dlk1-Gtl2 locus caused by insertional mutagenesis in the Gtl2 59 region. BMC Genet 7: 44. 32. Charlier C, Segers K, Karim L, Shay T, Gyapay G, et al. (2001) The callipyge mutation enhances the expression of coregulated imprinted genes in cis without affecting their imprinting status. Nat Genet 27: 367–369. 33. Georges M, Charlier C, Cockett N (2003) The callipyge locus: evidence for the trans interaction of reciprocally imprinted genes. Trends Genet 19: 248–252. 34. Moon YS, Smas CM, Lee K, Villena JA, Kim KH, et al. (2002) Mice lacking paternally expressed Pref-1/Dlk1 display growth retardation and accelerated adiposity. Mol Cell Biol 22: 5585–5592. 35. Tsai CE, Lin SP, Ito M, Takagi N, Takada S, et al. (2002) Genomic imprinting contributes to thyroid hormone metabolism in the mouse embryo. Curr Biol 12: 1221–1226. 36. Sekita Y, Wagatsuma H, Nakamura K, Ono R, Kagami M, et al. (2008) Role of retrotransposon-derived imprinted gene, Rtl1, in the feto-maternal interface of mouse placenta. Nat Genet 40: 243–248. 37. Seitz H, Youngson N, Lin SP, Dalbert S, Paulsen M, et al. (2003) Imprinted microRNA genes transcribed antisense to a reciprocally imprinted retrotransposon-like gene. Nat Genet 34: 261–262. 38. Davis E, Caiment F, Tordoir X, Cavaille´ J, Ferguson-Smith A, et al. (2005) RNAi-mediated allelic trans-interaction at the imprinted Rtl1/Peg11 locus. Curr Biol 15: 743–749.

1. da Rocha ST, Edwards CA, Ito M, Ogata T, Ferguson-Smith AC (2008) Genomic imprinting at the mammalian Dlk1-Dio3 domain. Trends Genet 24: 306–316. 2. Kagami M, Sekita Y, Nishimura G, Irie M, Kato F, et al. (2008) Deletions and epimutations affecting the human 14q32.2 imprinted region in individuals with paternal and maternal upd(14)-like phenotypes. Nat Genet 40: 237–242. 3. Kagami M, Yamazawa K, Matsubara K, Matsuo N, Ogata T (2008) Placentomegaly in paternal uniparental disomy for human chromosome 14. Placenta 29: 760–761. 4. Kotzot D (2004) Maternal uniparental disomy 14 dissection of the phenotype with respect to rare autosomal recessively inherited traits, trisomy mosaicism, and genomic imprinting. Ann Genet 47: 251–260. 5. Temple IK, Shrubb V, Lever M, Bullman H, Mackay DJ (2007) Isolated imprinting mutation of the DLK1/GTL2 locus associated with a clinical presentation of maternal uniparental disomy of chromosome 14. J Med Genet 44: 637–640. 6. Buiting K, Kanber D, Martı´n-Subero JI, Lieb W, Terhal P, et al. (2008) Clinical features of maternal uniparental disomy 14 in patients with an epimutation and a deletion of the imprinted DLK1/GTL2 gene cluster. Hum Mutat 29: 1141–1146. 7. Hosoki K, Ogata T, Kagami M, Tanaka T, Saitoh S (2008) Epimutation (hypomethylation) affecting the chromosome 14q32.2 imprinted region in a girl with upd(14)mat-like phenotype. Eur J Hum Genet 16: 1019–1023. 8. Zechner U, Kohlschmidt N, Rittner G, Damatova N, Beyer V, et al. (2009) Epimutation at human chromosome 14q32.2 in a boy with a upd(14)mat-like clinical phenotype. Clin Genet 75: 251–258. 9. Li E, Beard C, Jaenisch R (1993) Role for DNA methylation in genomic imprinting. Nature 366: 362–365. 10. Rosa AL, Wu YQ, Kwabi-Addo B, Coveler KJ, Reid Sutton V, et al. (2005) Allele-specific methylation of a functional CTCF binding site upstream of MEG3 in the human imprinted domain of 14q32. Chromosome Res 13: 809–818. 11. Wylie AA, Murphy SK, Orton TC, Jirtle RL (2000) Novel imprinted DLK1/ GTL2 domain on human chromosome 14 contains motifs that mimic those implicated in IGF2/H19 regulation. Genome Res 10: 1711–1718. 12. Tierling S, Dalbert S, Schoppenhorst S, Tsai CE, Oliger S, et al. (2007) Highresolution map and imprinting analysis of the Gtl2-Dnchc1 domain on mouse chromosome 12. Genomics 87: 225–235. 13. Takada S, Paulsen M, Tevendale M, Tsai CE, Kelsey G, et al. (2002) Epigenetic analysis of the Dlk1-Gtl2 imprinted domain on mouse chromosome 12: implications for imprinting control from comparison with Igf2-H19. Hum Mol Genet 11: 77–86. 14. Ohlsson R, Renkawitz R, Lobanenkov V (2001) CTCF is a uniquely versatile transcription regulator linked to epigenetics and disease. Trends Genet 17: 520–527. 15. Hark AT, Schoenherr CJ, Katz DJ, Ingram RS, Levorse JM, et al. (2000) CTCF mediates methylation-sensitive enhancer-blocking activity at the H19/Igf2 locus. Nature 405: 486–489. 16. Kanduri C, Pant V, Loukinov D, Pugacheva E, Qi CF, et al. (2000) Functional association of CTCF with the insulator upstream of the H19 gene is parent of origin-specific and methylation-sensitive. Curr Biol 10: 853–856. 17. da Rocha ST, Tevendale M, Knowles E, Takada S, Watkins M, et al. (2007) Restricted co-expression of Dlk1 and the reciprocally imprinted non-coding RNA, Gtl2: implications for cis-acting control. Dev Biol 306: 810–823. 18. Wan LB, Pan H, Hannenhalli S, Cheng Y, Ma J, et al. (2008) Maternal depletion of CTCF reveals multiple functions during oocyte and preimplantation embryo development. Development 135: 2729–2738.

PLoS Genetics | www.plosgenetics.org

13

June 2010 | Volume 6 | Issue 6 | e1000992


siRNA–Mediated Methylation of Arabidopsis Telomeres Jan Vrbsky1,2.¤a, Svetlana Akimcheva1., J. Matthew Watson1., Thomas L. Turner3, Lucia Daxinger1¤b, Boris Vyskot2, Werner Aufsatz1, Karel Riha1* 1 Gregor Mendel Institute of Molecular Plant Biology, Austrian Academy of Sciences, Vienna, Austria, 2 Institute of Biophysics, Czech Academy of Sciences, Brno, Czech Republic, 3 Ecology, Evolution, and Marine Biology Department, University of California Santa Barbara, Santa Barbara, California, United States of America

Abstract Chromosome termini form a specialized type of heterochromatin that is important for chromosome stability. The recent discovery of telomeric RNA transcripts in yeast and vertebrates raised the question of whether RNA–based mechanisms are involved in the formation of telomeric heterochromatin. In this study, we performed detailed analysis of chromatin structure and RNA transcription at chromosome termini in Arabidopsis. Arabidopsis telomeres display features of intermediate heterochromatin that does not extensively spread to subtelomeric regions which encode transcriptionally active genes. We also found telomeric repeat–containing transcripts arising from telomeres and centromeric loci, a portion of which are processed into small interfering RNAs. These telomeric siRNAs contribute to the maintenance of telomeric chromatin through promoting methylation of asymmetric cytosines in telomeric (CCCTAAA)n repeats. The formation of telomeric siRNAs and methylation of telomeres relies on the RNA–dependent DNA methylation pathway. The loss of telomeric DNA methylation in rdr2 mutants is accompanied by only a modest effect on histone heterochromatic marks, indicating that maintenance of telomeric heterochromatin in Arabidopsis is reinforced by several independent mechanisms. In conclusion, this study provides evidence for an siRNA–directed mechanism of chromatin maintenance at telomeres in Arabidopsis. Citation: Vrbsky J, Akimcheva S, Watson JM, Turner TL, Daxinger L, et al. (2010) siRNA–Mediated Methylation of Arabidopsis Telomeres. PLoS Genet 6(6): e1000986. doi:10.1371/journal.pgen.1000986 Editor: Tetsuji Kakutani, National Institute of Genetics, Japan Received February 4, 2010; Accepted May 12, 2010; Published June 10, 2010 Copyright: ß 2010 Vrbsky et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: This work was supported by GEN-AU Austria (GZ200.140/1-VI/12006, http://www.gen-au.at/), The Czech Ministry of Education (LC06004, http://www. msmt.cz/), and an NSF International Research Fellowship (OISE-0700946, http://www.nsf.gov/). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. * E-mail: karel.riha@gmi.oeaw.ac.at . These authors contributed equally to this work. ¤a Current address: National Center for Biomolecular Research, Masaryk University, Brno, Czech Republic ¤b Current address: Queensland Institute of Medical Research, Brisbane, Australia

at subtelomeres and leads to increased telomeric recombination, without a concomitant change in histone modifications [7]. These data indicate a functional interaction between subtelomeric and telomeric chromatin. Heterochromatin was thought to be transcriptionally inactive, but this view has been challenged by discoveries of numerous noncoding (nc) transcripts derived from heterochromatic loci. Some of these transcripts directly contribute to the assembly of heterochromatin at defined chromosomal domains and their biogenesis is vital for processes such as X chromosome inactivation, genomic imprinting, transposon silencing and centromere function [8]. Thus, it is not surprising that although telomeres possess marks of repressive heterochromatin, they are not transcriptionally silent. Recent studies revealed the presence of telomeric repeatcontaining RNAs (TERRA) that are transcribed from subtelomeric regions in yeast and vertebrates [9–11]. TERRA are removed from telomeres either through Rat1p-dependent degradation in budding yeast or through non-sense mediated RNA decay (NMD) in human; deficiencies in these RNA processing pathways have dramatic effects on telomere maintenance [9,10]. Hypomethylation of subtelomeric regions in mammalian cells lacking DNA methyltransferases leads to the overproduction of TERRA [11,12]. This suggests that the epigenetic status of subtelomeres and telomeres influences TERRA expression.

Introduction Telomeres safeguard the stability of eukaryotic chromosomes by protecting natural chromosome ends from triggering DNA damage responses. Chromosome termini consist of telomeric and subtelomeric repeats that are bound by a specific set of telomere binding proteins as well as nucleosomes that exhibit features of pericentric heterochromatin [1]. These regions are usually devoid of functional genes, and transgenes integrated in the vicinity of telomeres are subjected to transcriptional silencing, a phenomenon known as telomere position effect [2]. Studies in mammals indicate that telomeric heterochromatin plays an important function in chromosome end protection and telomere length regulation. Inactivation of the SIRT6 histone deacetylase in human cells causes hyperacetylation of telomeric histone H3, telomere dysfunction and premature cell senescence [3]. Deficiency in histone methyltransferases or the retinoblastoma tumor suppressor leads to disruption of telomeric heterochromatin and aberrant telomere elongation in mouse cells [4–6]. Another important hallmark of heterochromatin in mammals is DNA methylation. Although vertebrate telomeric DNA does not appear to be methylated due to the lack of canonical CG sites, subtelomeric repeats are heavily methylated [7]. Interestingly, inactivation of DNA methyltransferases in mouse cells decreases 5-methylcytosine PLoS Genetics | www.plosgenetics.org

1

June 2010 | Volume 6 | Issue 6 | e1000986


Modulation of Telomeric Chromatin by siRNA

Author Summary Telomeres are protein–DNA structures that protect the ends of eukaryotic chromosomes. A failure in this protective structure can lead to chromosomal instabilities and contribute to cancer and aging. The protective nature of telomeres relies on complex interactions between repetitive telomeric DNA and associated proteins. One major question is how telomeric proteins, including telomere-associated nucleosomes, are modified in order to achieve this protection. In this study, we have discovered that Arabidopsis telomeric nucleosomes contain a unique mixture of both active and inactive chromatin marks. Additionally, the telomeric DNA itself is modified by methylation of cytosines within the telomeric repeat. Regulation of DNA methylation is achieved by telomeric repeat–containing small RNAs, which are derived from the processing of telomeric transcripts by the RNA– dependent DNA methylation pathway. From these data, we infer that the formation of a proper telomere structure is partly regulated by non-coding telomeric RNAs.

The discovery of TERRA raised the question of whether ncRNAs contribute to the establishment of telomeric heterochromatin. This hypothesis gained support in a recent study in which downregulation of TERRA by exogenous short interfering RNAs (siRNAs) in human cell lines led to depletion of histone heterochromatic modification from telomeres [13]. In many organisms, RNA-mediated chromatin silencing relies on small RNA molecules that guide effector complexes to target sites [8,14]. However, involvement of small RNAs in chromatin formation at canonical telomeres has not been shown yet. In this study, we investigate chromatin organization and transcription at chromosome ends in the model plant Arabidopsis thaliana. We detect the presence of transcripts containing telomeric repeats and show that some of these transcripts are processed into ,24 nt siRNAs. These transcripts are produced from telomeres as well as from intrachromosomal telomeric loci that are mainly located at centromeres. The 24 nt siRNAs are generated through the RNA-dependent DNA methylation (RdDM) pathway, which is a plant-specific mechanism that utilizes siRNAs to guide DNA methyltransferases to asymmetric cytosines (CNN) [15,16]. We demonstrate that RdDM is responsible for methylation of telomeric DNA that contains cytosines exclusively in asymmetric sequence contexts and hence for reinforcement of heterochromatic marks at telomeres.

Figure 1. Expression of Arabidopsis chromosome-terminal genes. (A) A diagram of gene arrangement at the ends of five Arabidopsis chromosomes. Arrows illustrate the relative size and direction of transcripts of annotated terminal genes. The distance of the predicted ATG codon from the telomere is indicated. (B) Expression of the terminal genes in different tissues of wild-type plants assayed by RT-PCR. The size of the PCR products is indicated in parenthesis. doi:10.1371/journal.pgen.1000986.g001

expression patterns and the size of the RT-PCR products corresponded to the predicted size of the spliced mRNAs. There was no obvious correlation between the level of expression and promoter distance from telomeres, and even the At2g48160 gene, with a promoter immediately adjacent to telomeric DNA, was robustly expressed. These data indicate that, in contrast to yeast and mammals, Arabidopsis telomeres do not silence genes located in their vicinity. The high transcriptional activity near telomeres raised questions about the chromatin structure of chromosome termini in Arabidopsis. We investigated the distribution of histone modification marks typical for plant euchromatin (tri-methylation of histone H3 at Lys4, H3K4me3) and heterochromatin (di-methylation of H3 at Lys9, H3K9me2; and mono-methylation of H3 at Lys27, H3K27me1) at telomere-associated regions by chromatin immunoprecipitation (ChIP). The ,600 bp region immediately adjacent to the telomere on the right arm of chromosome 2 (2R) represents the promoter of the At2g48160 gene (Figure 2A) and carries typical euchromatic histone marks (Figure 2B). The H3K4me3 euchromatin mark was also dominant at the promoter of the At1g01010 gene that is located ,3.5 kb from the telomere on the left arm of chromosome 1 (region 1L-3, Figure 2A and 2B), although we could detect a weak H3K27me1 signal that is usually typical of heterochromatin. Histone heterochromatic marks (H3K9me2 and H3K27me1) became more pronounced at the 1L-2 and 1L-1 regions that are located on the same arm ,1.5 kb and 1 kb from the telomere, respectively (Figure 2A and 2B). The 1L telomere contains a recent 104 bp insertion of mitochondrial

Results Chromatin organization at Arabidopsis chromosome termini Gene organization at chromosome ends in Arabidopsis appears to be unique. In contrast to the majority of organisms with known telomere/subtelomere sequences, 8 of the 10 Arabidopsis subtelomeres have no repetitive DNA, and predicted genes are annotated in the immediate vicinity of telomeres [17] (Figure 1A). We experimentally confirmed that sequences annotated as chromosome ends are indeed associated with telomeres for 7 chromosome arms with the exception of the right arm of chromosome 3 [18]. The two remaining chromosome termini contain clusters of ribosomal RNA genes (NORs) [19]. We performed reverse transcription (RT) PCR analysis to verify that all the predicted terminal genes are expressed and that they do not represent pseudogenes (Figure 1B). The genes showed distinct tissue-specific PLoS Genetics | www.plosgenetics.org

2

June 2010 | Volume 6 | Issue 6 | e1000986


Modulation of Telomeric Chromatin by siRNA

Figure 2. Chromatin structure of Arabidopsis chromosome termini. (A) Schematic diagram of the 1L and 2R chromosome ends. Black boxes represent telomeric DNA. The red and blue bars span regions analyzed by bisulfite sequencing and ChIP PCR, respectively. The green bar indicates the region analyzed by ChIP qPCR. (B) ChIP PCR data showing the distribution of methylated histones at unique regions immediately adjacent to telomeres at the indicated chromosome arms. A euchromatic fragment of the MEE51 gene and a heterochromatic gypsy-like retrotransposon (At4g03770) were amplified from the ChIP fractions as a control. (C) Immunoprecipitated DNA analyzed by sequential dot-blot hybridization with a telomeric probe and the centromeric CEN180 probe. doi:10.1371/journal.pgen.1000986.g002

DNA embedded within the centromere-proximal region of telomeric repeats [20] (Figure 2A). Using this insertion to design primers that span the centromere-proximal part of the 1L telomere (1L-0, Figure 2A), we were able to demonstrate that this region also displays heterochromatin marks (Figure 2B). Nevertheless, the 1L-0 region still possessed clearly detectable H3K4me3, which is atypical of classical heterochromatin where the H3K4me3 modification is strongly reduced in comparison to H3K27me1 and H3K9me2. A similar histone-modification pattern was also observed in telomere-adjacent regions of five other chromosome arms (Figure 2B). To further examine chromatin at telomeres, we analyzed ChIP fractions by dot-blot hybridization with a telomeric probe (Figure 2C). The Arabidopsis genome is enriched for intrachromosomal degenerated telomeric repeats that are mainly localized at centromeres (Figure S1). To specifically assay for chromatin at telomeres, we used stringent hybridization conditions at which the centromere-derived signal is eliminated to less than 2% of the total telomeric signal (Figure S1). We readily detected H3K27me1 and H3K9me2 modifications, and a weaker but still clearly detectable H3K4me3 signal. This hybridization pattern was reminiscent of the results obtained by ChIP analysis of telomere-adjacent regions by PCR (Figure 2B). Thus, our ChIP data show that Arabidopsis telomeres form chromatin that is enriched for H3K9me2 and H3K27me1 heterochromatic marks, but still retains the euchromatic H3K4me3 modification. We found that the heterochromatin marks extend ,1.5 kb into the subtelomeric region of 1L. A survey of a high-resolution genome-wide map of H3K9me2 distribution indicates that H3K9me2 also spreads up to 1.5 kb from telomeres at chromosome arms 1R, 3L, 4R and 5L [21] (http://epigenomics.mcdb.ucla.edu/H3K9m2/). However, detecting the prominent H3K4me3 signal side by side with the heterochromatic marks (Figure 2B and 2C) strongly indicates that PLoS Genetics | www.plosgenetics.org

Arabidopsis telomeres exhibit features of intermediate heterochromatin that is characterized by retention of opposing histone H3 methylation marks [22].

Identification of telomeric DNA–containing transcripts and siRNAs We next asked whether Arabidopsis telomeres are transcribed by assaying for the presence of TERRA by Northern hybridization with a CCCTAAA probe. We readily detected two types of TERRA: heterogeneous transcripts which ranged from high molecular weight strands that migrated at the limits of gel resolution to hundreds of nucleotides, and several distinct bands (Figure 3A). We also detected antisense telomeric transcripts (ARRET) that gave a similar hybridization pattern as the TERRA by the complementary TTTAGGG probe (Figure 3A). These signals disappeared after pretreatment of the samples with RNaseA (Figure 3B and data not shown) demonstrating that they do not represent remnants of DNA in RNA preparations. Expression of TERRA varied between RNA samples extracted from different tissues of Arabidopsis (Figure 3C). Interestingly, remarkable variation in expression was also detected between different Arabidopsis accessions, as the levels of TERRA in seedlings of Zur and Ws ecotypes were almost two orders of magnitude higher than in Col and Ler (Figure 3C). Arabidopsis TERRA and ARRET can originate at telomeres or arise from transcription of degenerated intrachromosomal telomeric sequences localized at centromeric regions (Figure S1). The bulk of centromeric DNA consists of 177–179 bp satellite repeats (CEN180), a subset of which is transcribed [23]. Sequential hybridization of a Northern blot with probes detecting TERRA and CEN180 resulted in an almost identical hybridization pattern, characterized by five distinct bands (Figure 3A). Hybridization of the blots with probes detecting sequences immediately adjacent to telomeres did not 3

June 2010 | Volume 6 | Issue 6 | e1000986


Modulation of Telomeric Chromatin by siRNA

Figure 3. Identification of TERRA and ARRET transcripts in Arabidopsis. (A) Northern blot analysis of wild-type RNA that was hybridized with a strand-specific TTTAGGG probe, stripped after exposure further sequentially rehybridized with CCCTAAA and the centromeric CEN180 probe. The gel stained with ethidium bromide (EtBr) is shown as a loading control. (B) Sensitivity of ARRET transcripts to RNaseA. (C) Northern blot detection of TERRA in different tissues of wild-type Col plants and in seedlings of the Ws, Zur and Ler ecotypes. The left and right parts of the membrane were exposed for 1 day and 2 h, respectively. (D) Detection of telomere-derived TERRA and ARRET transcripts by RT-PCR. The diagram outlines the strategy used for strand-specific RT-PCR at a hypothetical chromosome end (the telomere is indicated as a black box). The size of the expected PCR product for each telomere is indicated. As chromosome ends 1R and 4R contain a stretch of sequence homology, one set of primers was used to assay for the expression at both telomeres in one reaction. The resulting chromosome-end-specific products can be distinguished by their size. It is currently unknown whether the subtelomere sequence at the NOR-bearing chromosome end represents the 2L or 4L telomere. ARRET transcripts at this arm were not analyzed because they correspond to the nascent 45S rRNA. doi:10.1371/journal.pgen.1000986.g003

produce any detectable signal (data not shown). These results suggest that TERRA and ARRET transcripts detected by Northern analysis mainly arise from centromeric regions that contain remnants of telomeric DNA and not from the transcription of telomeres. To examine whether telomeres are transcribed at levels nondetectable by Northern hybridization, we analyzed expression of subtelomeric regions adjacent to telomeric DNA by strand-specific RT-PCR in flowers. We could distinguish expression of TERRA and ARRET by using either telomeric or subtelomeric armspecific primers for reverse transcription (Figure 3D). We detected expression of both TERRA and ARRET at four out of eight analyzed chromosome arms. We failed to detect any transcription at chromosome arms 1R and 5R. Interestingly, only the TERRA but not ARRET transcripts were detected at 1L. The RT-PCR data demonstrate that at least five Arabidopsis telomeres are indeed transcribed, albeit at a low level. To gain further insights into telomere transcription, we cloned a ,500 nt promoter of the At2g48160 gene, which is located next to the telomere (Figure 1), in front of a reporter b-glucuronidase (GUS) gene in both sense and antisense orientations. We could detect GUS transcripts in transgenic plants carrying both constructs, although the expression PLoS Genetics | www.plosgenetics.org

in the antisense direction was much weaker than in the sense orientation (Figure S2). This experiment further supports the idea that telomere adjacent regions can drive transcription into a telomere. The presence of centromeric and telomeric TERRA and ARRET indicated that telomeric transcripts are able to form partially double stranded (ds) intermediates that could be processed by a Dicer into siRNA. In support of this hypothesis, siRNAs corresponding to both strands of telomeric DNA were detected in wild-type plants (Figure 4A). We estimate the size of the telomeric C-rich strand siRNAs (C-siRNA) to be 24–25 nt, and the size of G-siRNAs to be 23–24 nt (Figure S3). The formation of 24 nt siRNAs in Arabidopsis is mediated by RNAprocessing enzymes of the RdDM pathway [24]. This pathway is specific to plants and mediates methylation of cytosine residues in an asymmetric sequence context (CNN). The absence of telomeric 23–25 siRNAs in plants lacking RNA-dependent RNA polymerase 2 (RDR2), Dicer-like 3 (DCL3) or subunits of RNA Polymerase IV (NRPD1 or NRPD2) and their reduction in two other RdDM mutants (drd1 and nrpe1) further demonstrated that telomeric siRNAs belong to the category of 24 nt heterochromatic siRNAs (Figure 4A). These siRNAs are usually derived from heterochro4

June 2010 | Volume 6 | Issue 6 | e1000986


Modulation of Telomeric Chromatin by siRNA

and 5L (Figure 5, Table S2). Since these regions are formed by unique sequences, the origin of the siRNAs can be unambiguously traced to these loci. Interestingly, AGO4-associated siRNAs were particularly enriched at the chromosome ends that also exhibited expression of TERRA and ARRET (1L, 3L, 4R, 5L; Figure 5). These data strongly argue that telomeric TERRA and/or ARRET are processed into siRNAs.

The RdDM pathway mediates methylation of telomeric DNA Plants can methylate cytosines in all sequence contexts, and DNA methylation at asymmetric positions relies largely on 24 nt siRNAs and on the RdDM pathway. The presence of telomeric siRNAs prompted us to ask whether telomeric DNA, which contains cytosines exclusively in the CNN context, can be methylated. We took advantage of the unique insertion in the 1L telomere that allowed us to design primers spanning 13 CCCTAAA repeats located in the centromere-proximal part of the 1L telomere (region 1L-0’; Figure 2A). Bisulfite sequencing of the 1L-0’ region in wild-type plants revealed that over 40% of cytosines in these telomeric repeats are methylated (Figure 6). In contrast, the 1L and 2R subtelomeric regions are devoid of DNA methylation (Figure S4). The telomeric methylation in 1L-0’ is non-randomly distributed, with preferential enrichment at the third cytosine in the CCCTAAA sequence (Figure 6A and 6B). A similar observation was recently made through whole genome bisulfite sequencing that also revealed methylation of telomeric repeats, albeit at a lower total frequency than reported here [30]. The level of 5-methylcytosine in all sequence contexts was dramatically reduced in rdr2 mutants, arguing that methylation of the 1L-0’ region primarily depends on the RdDM mechanism (Figure 6A and 6C). We next examined whether cytosine methylation and its dependence on the RdDM pathway is a general feature of telomeric DNA. We sequentially hybridized bisulfite-treated total genomic DNA to oligonucleotide probes that first detected fully converted telomeric DNA (probe AAAATTT), then unconverted, and thus completely methylated DNA (probe TTTAGGG), and finally the complementary cytosine-free strand (probe CCCTAAA) as a control for loading (Figure 6D). A strong hybridization AAATTTT signal suggested that the bulk of telomeric DNA is only weakly methylated. Nevertheless, a portion of wild-type DNA was resistant to bisulfite conversion as hybridization with the TTTAGGG oligo probe showed a signal that was ,4-fold higher than a background signal from a corresponding amount of nonmethylated bisulfite-converted telomeric DNA cloned in a plasmid (Figure 6D and 6E). These data further indicate the presence of some heavily methylated CCCTAAA sequences in wild-type plants. Importantly, this CCCTAAA signal was reduced to a background level in rdr2 and nrpd2a mutants (Figure 6D and 6E). To further investigate whether methylation occurs at telomeres, we performed high-stringency hybridization of the bisulfiteconverted samples with a long telomeric TTTAGGG probe (Figure 6C). Under these conditions, converted plasmid-cloned telomeric DNA produces a high background hybridization signal that is likely caused by sufficiently stable interactions between longer fragments of the (TTTTAAA)n converted telomeric DNA and the (TTTAGGG)n probe. Nevertheless, wild-type DNA samples still produced a signal that was significantly higher than the background hybridization (Figure 6F). These data, together with the bisulfite sequencing of the 1L-0’ telomeric region, strongly argue that DNA methylation is a general characteristic of Arabidopsis telomeres and that its maintenance requires the RdDM pathway.

Figure 4. Detection of telomeric siRNAs. (A) Northern analysis of small RNAs from wild-type and the indicated RdDM mutants. The membrane was hybridized with a CCCTAAA probe, stripped and rehybridized with the TTTAGGG probe. Electronically merged autoradiograms show faster migration of the C-siRNA that is due to a sequence bias towards pyrimidines. The loading control represents a large RNA that hybridizes to the TTTAGGG probe. (B) Distribution of siRNAs containing at least 12 nt of telomeric sequence in different Argonaute complexes. (C) Distribution of AGO4-associated C- and GsiRNAs according to the extent of homology to telomeric sequence. The total number of siRNAs is indicated on the y-axis. doi:10.1371/journal.pgen.1000986.g004

matic loci and form the most abundant fraction of plant small RNAs [25,26]. They typically associate with Argonaute 4 (AGO4) that is part of the effector complex that, together with Polymerase V, mediates CNN methylation [27,28]. To determine whether telomeric siRNAs associate with AGO4, we surveyed published datasets containing ,600,000 Argonaute (AGO1, AGO2, AGO4 and AGO5)-bound small RNAs [29]. We identified a total of 133 small RNAs containing at least 12 nucleotides with a perfect telomeric repeat (Table S1). As expected, the majority of these small RNAs were associated with AGO4 (Figure 4B). Surprisingly, the AGO4-associated telomeric siRNAs were almost exclusively G-siRNAs and only a few C-siRNAs containing no more than 14 nt of the CCCTAAA repeat sequence were found in the dataset (Figure 4C). Since the levels of total G- and C-siRNAs are similar (Figure 4A), this bias may be caused by a selective incorporation of the G-siRNAs into the AGO4 complex. As TERRA transcripts are produced from telomeres as well as from centromere-located telomeric DNA, the siRNAs may be of either telomeric or centromeric origin. To determine whether telomere-derived transcripts are processed into siRNAs, we aligned Argonaute-associated siRNAs with telomere-adjacent sequences. We found abundant siRNAs corresponding to both strands of subtelomeric DNA at chromosome arms 1L, 1R, 3L, 4R PLoS Genetics | www.plosgenetics.org

5

June 2010 | Volume 6 | Issue 6 | e1000986


Modulation of Telomeric Chromatin by siRNA

PLoS Genetics | www.plosgenetics.org

6

June 2010 | Volume 6 | Issue 6 | e1000986


Modulation of Telomeric Chromatin by siRNA

Figure 5. Distribution of Argonaute-associated siRNAs in telomere-adjacent regions. The diagrams represent subtelomeric regions at the indicated chromosome arms. The rulers indicate the distance from a continuous array of telomeric repeats, open boxes mark positions of telomeric repeats intermingled within subtelomeric sequences. Only chromosome arms containing siRNAs detected within 5 kb from telomeres are shown. The position and orientation of AGO-associated siRNAs is indicated by colored bars. Only siRNAs that aligned to unique locations are included (Table S2). The TERRA and ARRET transcripts detected by strand-specific RT-PCR are depicted by a red arrow. doi:10.1371/journal.pgen.1000986.g005

Loss of DNA methylation is often accompanied by chromatin remodeling. However, the decrease in telomeric DNA methylation did not result in a significant loss of heterochromatic histone marks, and both H3K9me2 and H3K27me1 remained enriched at the bulk of telomeric DNA in rdr2 mutants (Figure 7A). However, analysis of histone modifications at the 1L-0’ locus by ChIP and quantitative PCR (Figure 7B and 7C) showed a decrease in H3K9me2 and H3K27me1 (Figure 7C) in rdr2 mutants. These data indicate that although the RdDM-dependent mechanism is not solely responsible for heterochromatin formation at telomeres, it contributes to its maintenance by mediating methylation of telomeric DNA, thereby reinforcing heterochromatic histone modifications. Disruption of telomeric heterochromatin or demethylation of subtelomeric sequences leads to increased telomere elongation and recombination in mouse [7]. Our analysis of telomere length and

intrachromatid recombination at chromosome ends did not reveal any differences between RdDM mutants and wild-type plants (Figure S5 and Figure S6). This observation further corroborates our finding that despite reduced DNA methylation, the bulk of telomeric chromatin in rdr2 mutants still retains heterochromatic features.

Discussion Heterochromatin is a universal characteristic of chromosome termini in a variety of organisms, including yeast, flies and mammals. Subtelomeric regions in these organisms are gene-poor and enriched for middle to highly repetitive sequences that contribute to the formation of a heritably repressed chromatin structure at chromosome termini that shares similarities with pericentromeric heterochromatin [1,31,32]. Nevertheless, some

Figure 6. Methylation of telomeric DNA in wild-type and RdDM-deficient plants. (A) The chart shows the frequency and distribution of cytosine methylation in the 1L-0’ region. In total, 30 clones from five independently treated wild-type samples and 19 clones from three independent rdr2 samples were analyzed. Asterisks indicate the third cytosine in the CCCTAAA sequence. (B) The proportion of methylated cytosines in the 1L-0’ region of wild-type plants depending on the position within the telomeric repeat as determined by bisulfite sequencing. (C) Frequency of cytosine methylation in the whole 1L-0’ region according to sequence context in wild-type and rdr2 plants. (D) Cytosine methylation in bulk telomeric DNA assayed by dot-blot hybridization. Bisulfite-treated (BS) genomic DNA was spotted onto a membrane (,7, 33 and 200 ng from each sample) and sequentially hybridized with AAAATTT, TTTAGGG and CCCTAAA probes. Untreated wild-type genomic DNA, and untreated and bisulfite-treated (BS) plasmids carrying ,750 nt of Arabidopsis telomeric DNA were used as controls. (E,F) Quantification of signals obtained with oligo (E) and long (F) TTTAGGG probes. The signal intensity of non-converted DNA (obtained with the TTTAGGG probe) was normalized to the amount of telomeric DNA determined from hybridization with the CCCTAAA probe. The signal from BS-treated plasmid served to determine background hybridization of the probes to fully converted non-methylated telomeric DNA. Error bars represent standard deviation (N = 3). doi:10.1371/journal.pgen.1000986.g006

PLoS Genetics | www.plosgenetics.org

7

June 2010 | Volume 6 | Issue 6 | e1000986


Modulation of Telomeric Chromatin by siRNA

cations are most pronounced immediately next to telomeres and that their presence gradually recedes with growing distance from telomeres. Data on whole-genome distribution of H3K9me2 indicate that this also holds true for telomere-associated regions of several other chromosome arms [21]. These data infer that repressive histone marks are primarily established at telomeres and spread only a limited distance within adjacent subtelomeric sequences. The existence of such relatively small clusters of repressive chromatin (2–5 kb) next to otherwise large gene-rich regions suggests a functional importance for the heterochromatization of telomeres in Arabidopsis. It further suggests the existence of mechanisms that specifically maintain repressive histone modifications at telomeres. Assembly of heterochromatin at chromosome ends in budding yeast is partially dependent on tethering Sir proteins to telomeres via the Rap1 telomere-binding protein [31]. Human SIRT6 histone deacetylase preferentially associates with telomeres, although how it is recruited to chromosome termini is not known [3]. A recent study in mice overexpressing TRF2 indicates that, similar to the situation in yeast, heterochromatin formation at telomeres in mammals may also involve telomere-binding proteins [37]. The discovery of TERRA provides another attractive model that involves targeting the chromatin remodeling machinery to chromosome termini through ncRNA [38,39]. This suggestion was recently corroborated by the finding that downregulation of TERRA by RNAi in human cells causes a decrease in histone H3K9 methylation [13]. It was proposed that TERRA facilitates heterochromatin formation by stabilizing interactions between heterochromatin factors and telomeric DNA. In this study, we demonstrate expression of telomeric transcripts in Arabidopsis and describe a mechanism by which telomeric repeats-containing RNAs affect telomeric chromatin through siRNA. In contrast to the situation in mammals, where only UUAGGG telomeric transcripts were detected [10,11], both telomeric strands appear to be transcribed from some telomeres in Arabidopsis. This indicates that canonical telomeric DNA may, under certain circumstances, act as a promoter and initiate transcription. Two lines of observations further corroborate the link between transcription and telomeric DNA in Arabidopsis. Firstly, short stretches of a telomeric sequence were found in numerous Arabidopsis promoters and it has been shown that these interstitial telomere motifs are required for transcription [40]. Secondly, several transcription factors have been identified in Arabidopsis that specifically bind to telomeric DNA in electromobility shift assays (reviewed in [41]). Thus, it is possible that some of these transcription factors localize to telomeres and promote their expression. In addition to transcripts that originated at telomeres, we detected TERRA and ARRET that are apparently generated by transcription of centromere-associated telomeric loci. We cannot currently determine the exact identity of telomere- or centromerederived TERRA/ARRET that is processed by DCL3 and degraded to telomeric siRNAs. The requirement of RDR2 for siRNA formation indicates that the predicted dsRNA intermediate is not a simple annealing product of complementary TERRA and ARRET, but is dependent on additional RNA-dependent RNA synthesis. Thus, even relatively low level transcripts can yield significant amounts of siRNA. In fact, direct detection of precursor transcripts in the RdDM pathway has been so far reported only in a special transgene system [42]. In plants, heterochromatic siRNAs serve to guide DNA methylases to specific asymmetric CNN positions in a mechanism that relies on AGO4 [28]. Interestingly, AGO4 appears to retain telomeric G-siRNAs, and not the complementary C-siRNAs, although these data should still

Figure 7. Telomeric heterochromatin in RdDM mutants. (A) Representative ChIP data of wild-type and rdr2 plants. Chromatin was immunoprecipitated with antibodies against histone H3, H3K4me3, H3K9me2 and H3K27me1, blotted onto a membrane and hybridized with the TTTAGGG probe. The same membrane was stripped after exposure and rehybridized with the centromeric CEN180 probe. Data from three independent ChIP experiments were used for quantification. Signals were normalized to mock. (B) Analysis of ChIP fractions from wild-type and rdr2 plants by PCR with primers spanning the 1L-0 locus. (C) qPCR analysis of H3K4me3, H3K9me2 and H3K27me1 at the 1L-0 locus in wild-type and rdr2 plants. Each value represents an average of three qPCR measurements normalized to histone H3 occupancy for each ChIP fraction. The results of three independent pairwise ChIP experiments (1, 2 and 3) are presented. doi:10.1371/journal.pgen.1000986.g007

aspects of chromatin organization appear to be unique at telomeres as telomeric chromatin in humans and plants display unusually short nucleosomal spacing (,160 nt) in comparison with the ,180 nt periodicity at the bulk of chromatin [33–35]. In contrast to many other organisms, telomeres in Arabidopsis are directly adjacent to transcriptionally active genes. This situation is more similar to silenced transposons inserted in gene-rich regions than to pericentromeric heterochromatin. This is also reflected in the organization of telomeric chromatin that exhibits features of intermediate heterochromatin that is characterized by the presence of both active and repressive histone H3 marks. Such chromatin was described to be associated with some Arabidopsis transposons and transgenic loci [22,36]. Chromatin analysis of the 1L subtelomere demonstrates that repressive histone H3 modifiPLoS Genetics | www.plosgenetics.org

8

June 2010 | Volume 6 | Issue 6 | e1000986


Modulation of Telomeric Chromatin by siRNA

described [49]. Centromeric transcripts were detected by hybridization with a [32P]-labeled CEN180 repeat unit amplified from Arabidopsis genomic DNA using primers CEN1 and CEN2 (Table S3). For RT-PCR analyses, ,2 mg of total RNA was reverse transcribed by using oligo dT for gene expression. The (CCCTAAA)3 oligo or subtelomere-specific primers (Table S3) were used for RT of TERRA and ARRET, respectively. The respective cDNAs were amplified by 25–35 cycles of PCR with specific primers (Table S3). Small RNAs were isolated from inflorescences using the mirVana miRNA isolation kit (Ambion), separated on 15% polyacrylamide gels and electroblotted onto a nylon membrane. Telomeric siRNAs were detected by hybridization with either (TTTAGGG)4 or (TAAACCC)4 oligo probes in ULTRAhyb-Oligo hybridization buffer (Ambion) at 42uC. The artificial 25 and 23 nt siRNAs were synthesized by in vitro transcription using T7 RNA polymerase (MBI). The T7-TOP oligonucleotide (10 mM) was annealed to a template oligonucleotide (10 mM) as indicated in Figure S3. In vitro transcription was carried out with 30U of T7 RNA polymerase (MBI) and the annealed oligos (0.5 mM) in 50 mL of 16 Transcription buffer (MBI) supplemented with NTPs (10 mM) and RiboLock RNase inhibitors (MBI) for 60 min at 37uC. 25 mL of the reaction was separated on a 15% polyacrylamide gel, electroblotted onto a nylon membrane and analyzed by Southern hybridization.

be verified by Northern analysis of AGO4 co-immunoprecipitated siRNAs. It is unknown whether the bias towards G-siRNAs is of biological significance, but it is interesting that the AGO4 complex appears to specifically retain siRNAs complementary to the telomeric strand to be methylated. Our data, showing methylation of bulk telomeric DNA as well as heavy methylation of the centromere-proximal region of the 1L telomere, together with data from whole genome bisulfite sequencing [30], argue that telomeric heterochromatin in Arabidopsis is not only defined by histone modifications, but also by DNA methylation. Although mammalian telomeres lack CG sites, and are, thus, believed to be unmethylated, at least two proteins linked to DNA methylation (SMCHD1, MBD3) have been found in purified fractions of human telomeric chromatin [43]. Additionally, the recent discovery of CNN and CNG methylation in human embryonic stem cells warrants the reexamination of DNA methylation at human telomeres [44]. We demonstrate that the maintenance of telomeric DNA methylation depends, to a large extent, on heterochromatic siRNA and the RdDM machinery. Intriguingly, loss of telomeric DNA methylation only has a slight effect on histone methylation at bulk telomeres, indicating that assembly of Arabidopsis telomeric heterochromatin relies on several reinforcing mechanisms that recruit histone methyltransferases such as SUVH4 to telomeres [45]. Loss of DNA methylation has a more profound effect on histone methylation at the centromere-proximal part of the 1L telomere. This indicates that RdDM may play a role in maintaining heterochromatin at the boundary between telomeres and adjacent euchromatic genes. The involvement of siRNA in modulation of telomeric heterochromatin may not be restricted to plants. Our data in Arabidopsis are reminiscent of the situation in fission yeast where heterochromatin in subtelomeric regions is established by two independent pathways, one of which relies on the telomere-binding protein Taz1, while the other involves RNAinduced transcriptional silencing (RITS) [46]. However, in contrast to the situation in Arabidopsis where siRNA targets canonical telomeric repeats, RITS in fission yeast is directed at centromere-like sequences that are located ,15 kb from telomeres. In humans, TERRA has been proposed to act as a scaffold, reinforcing interactions between telomere-binding proteins and heterochromatin factors such as ORC1 and HP 1 [13]. Nevertheless, human TERRA could also promote heterochromatin formation through an siRNA-mediated pathway. This notion is supported by the observation that enrichment of Argonaute-1 at human telomeres is correlated with increased H3K9 methylation and HP1 association [47], and by the discovery of telomerederived human siRNAs [48].

Analysis of DNA methylation Genomic DNA was extracted from 4 week old plants with the DNAeasy Plant Maxi Kit (Qiagen). Bisulfite modification was performed using the EpiTect Bisulphite Kit (Qiagen) according to the manufacturer’s instructions. The completeness of the conversion was tested by PCR amplification of a nonmethylated genomic region [50]. Modified DNA was used as a template for PCR amplification with the primers indicated in Table S3. The PCR products were cloned into the pCR2.1 TOPO cloning vector (Invitrogen) and sequenced using a BigDye terminator and an ABI310 sequencer (Applied Biosystems). The sequence of the clones was analyzed with the software CyMATE [50]. The efficiency of cytosine conversion in the 1L-0’ region in these samples was further controlled by either spiking genomic DNA with a bacterial plasmid containing a region that partially overlaps with 1L-0’ or by sequence analysis of other genomic loci that are devoid of 5-methylcytosines. For methylation analysis at bulk telomeric DNA, bisulfite-modified genomic DNA was transferred onto a nylon membrane by vacuum-blotting. As a control, a bisulfitemodified plasmid containing 750 bp of plant non-methylated telomeric DNA was blotted onto the membrane in an amount that roughly corresponded to the total amount of telomeric DNA present in genomic samples (1 ng of the plasmid contained telomeric DNA equivalent to ,260 ng of genomic DNA). The membrane was hybridized with the [32P] 59 end-labeled (TTTAAAA)4 oligo (AAAATTT probe) in a standard hybridization buffer [49] at 40uC. The membrane was washed twice for 10 min at 40uC in 26 SSC followed by a 40 min wash in 16 SSC at 40uC. The membrane was exposed to a Kodak Phosphor screen (Biorad) and scanned with Molecular Imager FX (Biorad). The membrane was then stripped and sequentially rehybridized with the TTTAGGG and CCCTAAA oligo probes at 55uC as described [49]. The final rehybridization was performed at 65uC with a strand-specific (TTTAGGG)n probe that was obtained by labeling of a 750 bp fragment of telomeric DNA with a-[32P]-GTP. The signals were quantified using QuantityOne software (Biorad).

Materials and Methods Plant material and growth conditions Arabidopsis mutants carrying the following alleles were used in this study: dcl3-1 (dcl3), rdr2-1 (rdr2), nrpd1a-4 (nrpd1), nrpd1b-1 (nrpe1), sgs2-1 (rdr6), drd1-1 (drd1) and nrpd2a-1 (nrpd2). Plants were grown in soil under long-day conditions (16 h light/8 h dark) at 22uC.

RNA analyses Total RNA was extracted using TriReagent solution (Sigma). For Northern blot analysis, 10 mg aliquots were separated on 1.2% formaldehyde agarose gels, blotted onto a nylon membrane and hybridized with [32P] 59 end-labeled (TTTAGGG)4 (TTTAGGG probe) or (TAAACCC)4 (CCCTAAA probe) oligonucleotides. Oligo hybridizations were carried out at 55uC as previously PLoS Genetics | www.plosgenetics.org

9

June 2010 | Volume 6 | Issue 6 | e1000986


Modulation of Telomeric Chromatin by siRNA

sequences in close proximity. To look for the presence of such loci in the Arabidopsis genome, we performed PCR with combinations of primers that flank the centromeric CEN180 satellite repeat (CEN1, CEN2) as well as with primers that anneal either to the Crich or G-rich telomeric strands (TelC, TelG). The reaction with both centromeric primers resulted in a ladder whose periodicity corresponds to the size of the CEN180 repeat unit (180 bp; lane 1). Importantly, products were also amplified in reactions in which the CEN1 primer was combined with either of the telomeric primers (lanes 5 and 6), demonstrating that telomeric sequences are adjacent to the CEN180 repeat. Furthermore, strong amplification products were obtained in reactions containing a single telomeric primer (lanes 8 and 9), indicating the existence of sequences that contain telomeric repeats in inverted orientation. (C) Intrachromosomal telomeric sequences do not efficiently hybridize to a telomeric probe under the high stringency conditions. Genomic DNA was digested with TruI1 restriction endonuclease, blotted onto a membrane and hybridized to a TTTAGGG probe at high stringency conditions (65 uC). The membrane was stripped after exposure and rehybridized to an oligo TTTAGGG probe under low stringency conditions (55 uC). While under the low stringency of hybridization interstitial telomeric DNA (,0.5 kb) contributed to ,21% of the total telomeric signal, these sequences were barely detectable when the membrane was rehybridized at high stringency and more than 98% of the signal was derived from terminal restriction fragments ranging between 2–4 kb. [Uchida W, Matsunaga S, Sugiyama R, Shibata F, Kazama Y, et al. (2002) Distribution of interstitial telomere-like repeats and their adjacent sequences in a dioecious plant, Silene latifolia. Chromosoma 111: 313–320. Vannier JB, Depeiges A, White C, Gallego ME (2009) ERCC1/XPF protects short telomeres from homologous recombination in Arabidopsis thaliana. PLoS Genet 5: e1000380. Armstrong SJ, Caryl AP, Jones GH, Franklin FC (2002) Asy1, a protein required for meiotic chromosome synapsis, localizes to axis-associated chromatin in Arabidopsis and Brassica. J Cell Sci 115: 3645–3655.] Found at: doi:10.1371/journal.pgen.1000986.s001 (9.55 MB TIF)

Chromatin immunoprecipitation Chromatin isolation and immunoprecipitation were performed as described [51] using antibodies against histone H3 (Abcam; cat. no. ab1791), H3K9me2 (Abcam; cat. no. ab1220), H3K4me3 (Abcam; cat. no. ab8580) and H3K27me1 (provided by Thomas Jenuwein). The DNA was column-purified from immunoprecipitated chromatin and concentrated in 50 ml of elution buffer. For dot-blot analysis, 40 ml of the DNA was blotted onto a nylon membrane and analyzed by hybridization with a [32P]-labeled 750 bp (TTTAGGG)n probe. For PCR analysis, 1 ml of the eluted DNA was amplified by 30 cycles of PCR with the primers specified in Table S1. Quantitative PCR analysis of the 1L-0 region was performed using the iQ5 Real Time PCR detection system (Biorad) and a 26 SensiMix Plus SyBR Kit (PeqLab).

Analysis of Argonaute-associated siRNAs The sequences of Argonaute-associated siRNAs were retrieved from the NCBI (accession number GSE10036). The individual AGO datasets were searched for the presence of siRNAs containing a string of at least 12 nucleotides of Arabidopsis telomeric repeats of any possible permutation. Telomeric siRNAs were copied into an Excel table and manually annotated. Subtelomeric siRNAs were identified by attempting to align all Argonaute-associated siRNAs to an ,15 kb sequence from the ends of each Arabidopsis chromosome using the publicly available program SOAP [52]. The subtelomeric sequences were derived from the sequences of whole chromosomes available in TAIR, and from cloned fragments of telomere-associated sequences deposited in the Gene Bank (AB033278 and AM177017). Perfectly matching siRNA alignments were retained, and plotted using R.

Cytology Mitotic chromosomes prepared from pistils of wild-type plants were subjected to fluorescence in situ hybridization (FISH) with a Cy3-conjugated (CCCTAAA)2 PNA probe (Metabion) as previously described [53]. Chromosomes were examined using a Zeiss Axioscope fluorescence microscope equipped with a CCD camera.

Figure S2 Analysis of bidirectional transcription of the GUS reporter gene from the 2Rp promoter. (A) Schematic representation of constructs used for this experiment. A ,500 bp genomic fragment (indicated by arrow) localized between the telomere and At2g48160 was cloned in sense and antisense orientations in front of the GUS reporter gene containing a ,200 nt intron (the intron is indicated by thin line; empty boxes represent exons). Resulting constructs (2Rp:GUS and R2p:GUS, respectively) were randomly inserted in Arabidopsis genome using Agorbacterium mediated transformation. T2 transgenic plants were analyzed for GUS expression by histochemical GUS assay and RT-PCR. Primers used for RT-PCR are indicated by arrows. (B) In total, T2 seedlings of 12 independent transgenic lines carrying the 2Rp:GUS construct, and 11 lines with the R2p:GUS were analyzed by histochemical GUS assay for the presence of GUS enzymatic activity. Two representative lines for each construct are shown. While 11 out 12 of 2Rp:GUS lines produced blue staining, none of the R2p:GUS lines gave a positive GUS signal. This experiment shows robust activty of the 2R promoter and confirmed RT-PCR data on the expression of the At2g48160 gene (Figure 1). (C) To further analyze whether the 2R promoter can drive expression in the antisense orientation, the presence of spliced GUS transcripts was examined by RT-PCR. The expected 133 bp long product was readily detected in all analyzed R2p:GUS lines. Interestingly, a weak but specific product was also amplified in four out of six 2Rp:GUS lines. These data are consistent with the RT-PCR

Telomere analyses The PETRA assay was carried out with genomic DNA extracted from a fifth generation tert mutant plant [54] according to the published protocol [18]. Terminal restriction fragment analysis was performed as described [49,55]. Analysis of intrachromatid telomeric recombination was performed by the tcircle amplification assay [56]. DNA extracted from Arabidopsis ku70 [57] mutants was used as a positive control.

Supporting Information Localization of telomeric DNA in Arabidopsis centromeres. A survey of the Arabidopsis genome led to the identification of several regions carrying short stretches of telomeric sequence in the proximity of centromeres (Uchida et al., 2002; Vannier et al., 2009). Fluorescent in situ hybridization (FISH) on pachytene chromosomes further showed co-localization of CEN180 and telomeric signals at the centromere of chromosome 1 (Armstrong et al., 2002) (A) This centromere-localized telomeric DNA is also readily detectable by FISH on mitotic metaphase chromosomes. The picture shows a diploid metaphase figure with ten Arabidopsis chromosomes counterstained with DAPI (red). Green signals represent telomeric DNA; the chromosome pair carrying centromere-localized telomeric DNA is indicated by arrows. (B) The poor annotation of Arabidopsis centromeres precluded in silico identification of genomic loci carrying telomeric and CEN180

Figure S1

PLoS Genetics | www.plosgenetics.org

10

June 2010 | Volume 6 | Issue 6 | e1000986


Modulation of Telomeric Chromatin by siRNA

Figure S5 Telomere length analysis in RdDM-deficient mutants.

analysis of TERRA at 2R (Figure 3D) and demonstrate that 2Rp promoter can initiate transcription into telomere. Found at: doi:10.1371/journal.pgen.1000986.s002 (3.40 MB TIF)

Southern analysis of Tru9I-digested genomic DNA hybridized with a telomeric probe. Each line represents a DNA sample extracted from a single plant. The telomere length in all analyzed mutants falls in the range typical for wild-type plants (2–5 kb). Found at: doi:10.1371/journal.pgen.1000986.s005 (1.63 MB TIF)

Figure S3 Determination of the size of the telomeric siRNA. (A)

Synthetic telomeric DNA and RNA oligonucleotides were used as size markers. DNA oligonucleotides are written in black. The telomeric G-RNAs (23 nt, 25 nt) and 23 nt C-RNA (written in blue) were synthesized by in vitro transcription from dsDNA produced by annealing the DNA oligonucleotides as indicated. (B) To determine the difference in migration between complementary telomeric RNA oligonucleotides, in vitro transcribed telomeric RNA as well as the indicated synthetic DNA oligonucleotides were separated by PAGE, electro-blotted onto a nylon membrane and sequentially hybridized with the CCCTAAA and TTTAGGG probes (only the right part of the membrane is shown after TTTAGGG hybridization). This experiment shows that 23 and 25 nt G-RNAs migrate like 25 and 27 nt TelG DNA oligonucleotides, respectively, while the 23 nt C-RNA migrates like the 22 nt TelG DNA oligonucleotide. This experiment demonstrates that CRNA migrates faster than G-RNA of the corresponding size. (C) DNA oligonucleotides were used as size markers and separated together with plant siRNAs by PAGE, blotted onto a membrane and hybridized with the radioactively-labeled CCCTAAA probe. Migration of plant floral G-siRNAs (marked by asterisks) corresponds to the migration of 25–26 nt TelG DNA oligonucleotides. The signal was stripped after exposure and the membrane was rehybridized with the TTTAGGG probe for detection of the C-siRNAs (marked by asterisks). Because the signal from TelG DNA oligonucleotides was not completely stripped, we could use it as a marker to determine that plant C-siRNAs migrate like 24– 25 nt TelG DNA oligonucleotides. Taking into account the difference in the migration of telomeric DNA and RNA (Figure S3B), we calculate that the size of plant G-siRNAs is 23–24 nt, and the size of the C-siRNAs is 24–25 nt. Found at: doi:10.1371/journal.pgen.1000986.s003 (10.61 MB TIF)

Figure S6 Analysis of intrachromatid recombination in RdDMdeficient mutants. Intrachromatid telomere recombination leads to excision of telomeric extrachromosomal circular DNA molecules (t-circles). We used the highly sensitive t-circle amplification assay to analyze the level of t-circles in RdDM mutants [56]. Genomic DNA was digested with the AluI restriction enzyme and digestionresistant t-circles were used as templates for primer extension via rolling circle amplification by the highly processive Phi29 polymerase. The high molecular weight products of rolling circle replication (indicated by an arrow) were separated from the bulk of digested genomic DNA by alkaline electrophoresis and were detected by Southern hybridization with a telomeric probe. Whereas a strong t-circle signal was obtained in ku70 mutants that exhibit increased telomeric recombination [56], no t-circles, indicating an elevated level of recombination, were detected in the RdDM-deficient plants. Found at: doi:10.1371/journal.pgen.1000986.s006 (1.30 MB TIF) Table S1 Argonaute-associated telomeric siRNAs. The region of homology to the Arabidopsis telomeric sequence is indicated in red. Found at: doi:10.1371/journal.pgen.1000986.s007 (0.03 MB XLS) Table S2 Argonaute-associated siRNAs that uniquely align to subtelomeric regions. Found at: doi:10.1371/journal.pgen.1000986.s008 (0.05 MB XLS) Table S3 Primers used in this study. Found at: doi:10.1371/journal.pgen.1000986.s009 (0.09 MB DOC)

Acknowledgments

Figure S4 Cytosine methylation in the subtelomeric regions 2R’, 1L-2’, and 1L-3’. Wild-type bisulfite-treated genomic DNA was used as a template for PCR with primers spanning subtelomeric regions 2R’, 1L-2’, and 1L-3’ (Figure 2B). The diagrams representing distribution of 5-methylcytosines in individual clones were generated using CyMATE software for analysis of sequencing data of bisulfiteconverted samples [50]. The data show almost a complete lack of DNA methylation in these regions. Found at: doi:10.1371/journal.pgen.1000986.s004 (9.05 MB TIF)

We thank Thomas Jenuwein for providing the H3K27me1 antibody and Maria Siomos and Marjori Matzke for helpful comments on the manuscript.

Author Contributions Conceived and designed the experiments: JV SA JMW BV WA KR. Performed the experiments: JV SA JMW WA. Analyzed the data: JV SA JMW TLT WA KR. Contributed reagents/materials/analysis tools: LD. Wrote the paper: JMW KR.

References 8. Zaratiegui M, Irvine DV, Martienssen RA (2007) Noncoding RNAs and gene silencing. Cell 128: 763–776. 9. Luke B, Panza A, Redon S, Iglesias N, Li Z, et al. (2008) The Rat1p 59 to 39 exonuclease degrades telomeric repeat-containing RNA and promotes telomere elongation in Saccharomyces cerevisiae. Mol Cell 32: 465– 477. 10. Azzalin CM, Reichenbach P, Khoriauli L, Giulotto E, Lingner J (2007) Telomeric repeat containing RNA and RNA surveillance factors at mammalian chromosome ends. Science 318: 798–801. 11. Schoeftner S, Blasco MA (2008) Developmentally regulated transcription of mammalian telomeres by DNA-dependent RNA polymerase II. Nat Cell Biol 10: 228–236. 12. Yehezkel S, Segev Y, Viegas-Pequignot E, Skorecki K, Selig S (2008) Hypomethylation of subtelomeric regions in ICF syndrome is associated with abnormally short telomeres and enhanced transcription from telomeric regions. Hum Mol Genet 17: 2776–2789. 13. Deng Z, Norseen J, Wiedmer A, Riethman H, Lieberman PM (2009) TERRA RNA binding to TRF2 facilitates heterochromatin formation and ORC recruitment at telomeres. Mol Cell 35: 403–413.

1. Blasco MA (2007) The epigenetic regulation of mammalian telomeres. Nat Rev Genet 8: 299–309. 2. Ottaviani A, Gilson E, Magdinier F (2008) Telomeric position effect: from the yeast paradigm to human pathologies? Biochimie 90: 93–107. 3. Michishita E, McCord RA, Berber E, Kioi M, Padilla-Nash H, et al. (2008) SIRT6 is a histone H3 lysine 9 deacetylase that modulates telomeric chromatin. Nature 452: 492–496. 4. Garcia-Cao M, O’Sullivan R, Peters AH, Jenuwein T, Blasco MA (2004) Epigenetic regulation of telomere length in mammalian cells by the Suv39h1 and Suv39h2 histone methyltransferases. Nat Genet 36: 94–99. 5. Gonzalo S, Garcia-Cao M, Fraga MF, Schotta G, Peters AH, et al. (2005) Role of the RB1 family in stabilizing histone methylation at constitutive heterochromatin. Nat Cell Biol 7: 420–428. 6. Jones B, Su H, Bhat A, Lei H, Bajko J, et al. (2008) The histone H3K79 methyltransferase Dot1L is essential for mammalian development and heterochromatin structure. PLoS Genet 4: e1000190. doi:10.1371/journal.pgen.1000190. 7. Gonzalo S, Jaco I, Fraga MF, Chen T, Li E, et al. (2006) DNA methyltransferases control telomere length and telomere recombination in mammalian cells. Nat Cell Biol 8: 416–424.

PLoS Genetics | www.plosgenetics.org

11

June 2010 | Volume 6 | Issue 6 | e1000986


Modulation of Telomeric Chromatin by siRNA

14. Moazed D (2009) Small RNAs in transcriptional gene silencing and genome defence. Nature 457: 413–420. 15. Pikaard CS, Haag JR, Ream T, Wierzbicki AT (2008) Roles of RNA polymerase IV in gene silencing. Trends Plant Sci 13: 390–397. 16. Matzke M, Kanno T, Daxinger L, Huettel B, Matzke AJ (2009) RNA-mediated chromatin-based silencing in plants. Curr Opin Cell Biol. 17. (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408: 796–815. 18. Heacock M, Spangler E, Riha K, Puizina J, Shippen DE (2004) Molecular analysis of telomere fusions in Arabidopsis: multiple pathways for chromosome end-joining. Embo J 23: 2304–2313. 19. Copenhaver GP, Pikaard CS (1996) RFLP and physical mapping with an rDNA-specific endonuclease reveals that nucleolus organizer regions of Arabidopsis thaliana adjoin the telomeres on chromosomes 2 and 4. Plant J 9: 259–272. 20. Kuo HF, Olsen KM, Richards EJ (2006) Natural variation in a subtelomeric region of Arabidopsis: implications for the genomic dynamics of a chromosome end. Genetics 173: 401–417. 21. Bernatavichute YV, Zhang X, Cokus S, Pellegrini M, Jacobsen SE (2008) Genome-wide association of histone H3 lysine nine methylation with CHG DNA methylation in Arabidopsis thaliana. PLoS ONE 3: e3156. doi:10.1371/ journal.pone.0003156. 22. Habu Y, Mathieu O, Tariq M, Probst AV, Smathajitt C, et al. (2006) Epigenetic regulation of transcription in intermediate heterochromatin. EMBO Rep 7: 1279–1284. 23. May BP, Lippman ZB, Fang Y, Spector DL, Martienssen RA (2005) Differential regulation of strand-specific transcripts from Arabidopsis centromeric satellite repeats. PLoS Genet 1: e79. doi:10.1371/journal.pgen.0010079. 24. Huettel B, Kanno T, Daxinger L, Bucher E, van der Winden J, et al. (2007) RNA-directed DNA methylation mediated by DRD1 and Pol IVb: a versatile pathway for transcriptional gene silencing in plants. Biochim Biophys Acta 1769: 358–374. 25. Mosher RA, Schwach F, Studholme D, Baulcombe DC (2008) PolIVb influences RNA-directed DNA methylation independently of its role in siRNA biogenesis. Proc Natl Acad Sci U S A 105: 3145–3150. 26. Zhang X, Henderson IR, Lu C, Green PJ, Jacobsen SE (2007) Role of RNA polymerase IV in plant small RNA metabolism. Proc Natl Acad Sci U S A 104: 4536–4541. 27. He XJ, Hsu YF, Zhu S, Wierzbicki AT, Pontes O, et al. (2009) An effector of RNA-directed DNA methylation in arabidopsis is an ARGONAUTE 4- and RNA-binding protein. Cell 137: 498–508. 28. Wierzbicki AT, Ream TS, Haag JR, Pikaard CS (2009) RNA polymerase V transcription guides ARGONAUTE4 to chromatin. Nat Genet 41: 630–634. 29. Mi S, Cai T, Hu Y, Chen Y, Hodges E, et al. (2008) Sorting of small RNAs into Arabidopsis argonaute complexes is directed by the 59 terminal nucleotide. Cell 133: 116–127. 30. Cokus SJ, Feng S, Zhang X, Chen Z, Merriman B, et al. (2008) Shotgun bisulphite sequencing of the Arabidopsis genome reveals DNA methylation patterning. Nature 452: 215–219. 31. Perrod S, Gasser SM (2003) Long-range silencing and position effects at telomeres and centromeres: parallels and differences. Cell Mol Life Sci 60: 2303–2318. 32. Pryde FE, Gorham HC, Louis EJ (1997) Chromosome ends: all the same under their caps. Curr Opin Genet Dev 7: 822–828. 33. Tommerup H, Dousmanis A, de Lange T (1994) Unusual chromatin in human telomeres. Mol Cell Biol 14: 5777–5785. 34. Fajkus J, Kovarik A, Kralovics R, Bezdek M (1995) Organization of telomeric and subtelomeric chromatin in the higher plant Nicotiana tabacum. Mol Gen Genet 247: 633–638. 35. Sykorova E, Fajkus J, Ito M, Fukui K (2001) Transition between two forms of heterochromatin at plant subtelomeres. Chromosome Res 9: 309–323.

PLoS Genetics | www.plosgenetics.org

36. Lippman Z, May B, Yordan C, Singer T, Martienssen R (2003) Distinct mechanisms determine transposon inheritance and methylation via small interfering RNA and histone modification. PLoS Biol 1: e67. doi:10.1371/ journal.pbio.0000067. 37. Benetti R, Schoeftner S, Munoz P, Blasco MA (2008) Role of TRF2 in the assembly of telomeric chromatin. Cell Cycle 7: 3461–3468. 38. Horard B, Gilson E (2008) Telomeric RNA enters the game. Nat Cell Biol 10: 113–115. 39. Luke B, Lingner J (2009) TERRA: telomeric repeat-containing RNA. Embo J 28: 2503–2510. 40. Tremousaygue D, Manevski A, Bardet C, Lescure N, Lescure B (1999) Plant interstitial telomere motifs participate in the control of gene expression in root meristems. Plant J 20: 553–561. 41. Zellinger B, Riha K (2007) Composition of plant telomeres. Biochim Biophys Acta 1769: 399–409. 42. Daxinger L, Kanno T, Bucher E, van der Winden J, Naumann U, et al. (2009) A stepwise pathway for biogenesis of 24-nt secondary siRNAs and spreading of DNA methylation. Embo J 28: 48–57. 43. Dejardin J, Kingston RE (2009) Purification of proteins associated with specific genomic Loci. Cell 136: 175–186. 44. Lister R, Pelizzola M, Dowen RH, Hawkins RD, Hon G, et al. (2009) Human DNA methylomes at base resolution show widespread epigenomic differences. Nature. 45. Grafi G, Ben-Meir H, Avivi Y, Moshe M, Dahan Y, et al. (2007) Histone methylation controls telomerase-independent telomere lengthening in cells undergoing dedifferentiation. Dev Biol 306: 838–846. 46. Kanoh J, Sadaie M, Urano T, Ishikawa F (2005) Telomere binding protein Taz1 establishes Swi6 heterochromatin independently of RNAi at telomeres. Curr Biol 15: 1808–1819. 47. Ho CY, Murnane JP, Yeung AK, Ng HK, Lo AW (2008) Telomeres acquire distinct heterochromatin characteristics during siRNA-induced RNA interference in mouse cells. Curr Biol 18: 183–187. 48. Cao F, Li X, Hiew S, Brady H, Liu Y, et al. (2009) Dicer independent small RNAs associate with telomeric heterochromatin. Rna 15: 1274–1281. 49. Riha K, Fajkus J, Siroky J, Vyskot B (1998) Developmental control of telomere lengths and telomerase activity in plants. Plant Cell 10: 1691–1698. 50. Hetzl J, Foerster AM, Raidl G, Mittelsten Scheid O (2007) CyMATE: a new tool for methylation analysis of plant genomic DNA after bisulphite sequencing. Plant J 51: 526–536. 51. Lawrence RJ, Earley K, Pontes O, Silva M, Chen ZJ, et al. (2004) A concerted DNA methylation/histone methylation switch regulates rRNA gene dosage control and nucleolar dominance. Mol Cell 13: 599–609. 52. Li R, Li YR, Kristiansen K, Wang J (2008) SOAP: short oligonucleotide aligment program. Bioinformatics 24: 713. 53. Akimcheva S, Zellinger B, Riha K (2008) Genome stability in Arabidopsis cells exhibiting alternative lengthening of telomeres. Cytogenet Genome Res 122: 388–395. 54. Riha K, McKnight TD, Griffing LR, Shippen DE (2001) Living with genome instability: plant responses to telomere dysfunction. Science 291: 1797–1800. 55. Fitzgerald MS, Riha K, Gao F, Ren S, McKnight TD, et al. (1999) Disruption of the telomerase catalytic subunit gene from Arabidopsis inactivates telomerase and leads to a slow loss of telomeric DNA. Proc Natl Acad Sci U S A 96: 14813–14818. 56. Zellinger B, Akimcheva S, Puizina J, Schirato M, Riha K (2007) Ku suppresses formation of telomeric circles and alternative telomere lengthening in Arabidopsis. Mol Cell 27: 163–169. 57. Riha K, Watson JM, Parkey J, Shippen DE (2002) Telomere length deregulation and enhanced sensitivity to genotoxic stress in Arabidopsis mutants deficient in Ku70. Embo J 21: 2819–2826.

12

June 2010 | Volume 6 | Issue 6 | e1000986


Initial Genomics of the Human Nucleolus Attila Ne´meth1, Ana Conesa2, Javier Santoyo-Lopez2, Ignacio Medina2, David Montaner2, Ba´lint Pe´terfia1, Irina Solovei3, Thomas Cremer3, Joaquin Dopazo2, Gernot La¨ngst1* 1 Department of Biochemistry III, University of Regensburg, Regensburg, Germany, 2 Department of Bioinformatics and Genomics, Centro de Investigacio´n Prı´ncipe Felipe, Valencia, Spain, 3 Department of Biology II, Ludwig-Maximilians University of Munich, Planegg-Martinsried, Germany

Abstract We report for the first time the genomics of a nuclear compartment of the eukaryotic cell. 454 sequencing and microarray analysis revealed the pattern of nucleolus-associated chromatin domains (NADs) in the linear human genome and identified different gene families and certain satellite repeats as the major building blocks of NADs, which constitute about 4% of the genome. Bioinformatic evaluation showed that NAD–localized genes take part in specific biological processes, like the response to other organisms, odor perception, and tissue development. 3D FISH and immunofluorescence experiments illustrated the spatial distribution of NAD–specific chromatin within interphase nuclei and its alteration upon transcriptional changes. Altogether, our findings describe the nature of DNA sequences associated with the human nucleolus and provide insights into the function of the nucleolus in genome organization and establishment of nuclear architecture. Citation: Ne´meth A, Conesa A, Santoyo-Lopez J, Medina I, Montaner D, et al. (2010) Initial Genomics of the Human Nucleolus. PLoS Genet 6(3): e1000889. doi:10.1371/journal.pgen.1000889 Editor: Asifa Akhtar, Max-Planck-Institute of Immunobiology, Germany Received November 18, 2009; Accepted March 1, 2010; Published March 26, 2010 Copyright: ß 2010 Ne´meth et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: GL and TC are supported by the Deutsche Forschungsgemeinschaft (DFG); GL by the Bayerisches Genomforschungsnetzwerk (BayGene); AN and GL by the University of Regensburg - DFG Anschubfinanzierung; and AC, DM, IM, JS-L, and JD by grants from project BIO BIO2008-04212 from the Spanish Ministry of Science and Innovation (MICINN) and grant (RD06/0020/1019) from Red Tema´tica de Investigacio´n Cooperativa en Ca´ncer (RTICC), Instituto de Salud Carlos III (ISCIII), MICINN. The National Institute of Bioinformatics (www.inab.org) is a platform of Genoma Espana. The CIBER de enfermedades raras is an initiative of the ISCIII. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. * E-mail: gernot.laengst@vkl.uni-regensburg.de

Thomson [16] and dyskeratosis congenita syndromes [17] and Diamond-Blackfan anemia [18]. Nucleoli are easily detectable under the microscope, however, despite the simple methods of nucleolus isolation, their molecular structure is largely unknown. The nucleolar proteome has been recently analysed by high-throughput mass-spectrometry [19], but the nucleic acid composition of nucleoli had not yet been determined. Therefore the aim of our investigations was to construct and characterize the first high-resolution, genome-wide map of NADs. Recent advances in sequencing and microarray technologies provided excellent platforms to subject nucleolusassociated DNA (naDNA) to critical scrutiny. The results presented here help to understand the mechanisms of nuclear information packaging by macromolecular assemblies and the functional compartmentalisation of the nucleus.

Introduction The largest and densest nuclear compartment is the nucleolus with its shell of perinucleolar DNA. The nucleolus is a unique object to study genome activity, since all three RNA polymerases are involved in the highly dynamic and tightly regulated ribosome biogenesis process, which is its main function. High proliferation activity of tumour cells coincides with high ribosome biogenesis activity thus exposing the nucleolus as a promising target in cancer therapy [1]. In addition, cell-type and function-dependent nucleolar localisation of tumour suppressor proteins, such as p53, MDM2 or p14ARF indicates the role of the nucleolus in carcinogenesis [2–5]. A number of other biological processes (e.g. senescence, RNA modification, cell-cycle control and stress sensing) are also regulated in the nucleolus and connect it to several functional networks of the cell [2–7]. Furthermore, chromatin motion is constrained at nucleoli or nuclear periphery, and disruption of nucleoli increases motility of chromatin domains, indicating the role of the nucleolus in higher-order chromatin arrangement [8]. The nucleolus can therefore be considered as a well-suited model system to investigate functional consequences of genome organisation. It is less well known, however, that alteration in the nucleolus might be linked to multiple forms of human disease, including viral infections. The interaction between viruses and the nucleolus is a pan-virus phenomenon, which is exhibited by DNA viruses, retroviruses and RNA viruses [9,10]. Moreover, multiple genetic disorders have been mapped to genes that encode proteins located in nucleoli under specific conditions. These include Werner [11], fragile X [12,13], Treacher Collins [14], Bloom [15], Rothmund– PLoS Genetics | www.plosgenetics.org

Results/Discussion Because the nucleolar proteome was analysed in HeLa cells [19], our study started with the purification of nucleoli from this widely used model system (Figure 1A). Enrichment of the nucleolar transcription factor UBF and depletion of nuclear lamina proteins laminA/C from the nucleolar fraction was monitored by Western blot. Nucleolus-associated DNA was then isolated, and ribosomal DNA (rDNA) enrichment was measured by quantitative PCR (Figure S1). To analyse the genomic localisation of purified naDNA at low resolution, we performed 2D FISH experiments. Hybridisation of naDNA on human lymphocyte metaphase spreads shows that it appears predominantly on p-arms of acrocentric chromosomes, the location of the 1

March 2010 | Volume 6 | Issue 3 | e1000889


Nucleolus Genomics

and centromeric repetitive sequences are overrepresented in naDNA compared to other chromosomal regions (Figure 1B). Next, naDNA was analysed using Nimblegen whole genome microarrays at 6,270-bp median probe spacing resolution and compared to genomic DNA by performing two-colour hybridisation (aCGH). The aCGH data reinforced the results of the 2D FISH experiments: p-arm-adjacent regions of the acrocentric chromosomes and pericentromeric regions are enriched in naDNA. More interestingly, many other chromosomal regions are also present in the naDNA fraction (Figure 2A, Figure S2 and S3). For example, a large part of chromosome 19 associates with the nucleolus (Figure 3E). This finding explains the presence of chromosome 19 in central regions of the interphase nucleus [20], being close to the nucleoli. To elucidate NAD-specific sequence signatures in more detail, 454 sequencing was performed. In total 47,378,399 bases were sequenced in 218,030 reads with an average length of 217 bases/read. We used the complementary set of microarray and sequencing data to visualise the genome-wide localisation of NADs. Genome-wide studies are performed almost exclusively using one high-throughput strategy, which limits the quality of the detection. The combination of techniques compensates the inherent mistakes of the different methods. Our results clearly show that certain NADs are detectable only with one of these approaches (Figure S2 and Table S1). It is important to mention that the p-arms of the five acrocentric chromosomes,

Author Summary It is becoming increasingly clear that the nuclear organization and location of genes in metazoan organisms is not random. Functionally related genes are often found next to each other in the linear genome, and distant DNA elements or DNA regions residing on different chromosomes may reside in specific nuclear compartments. The largest nuclear compartment is the nucleolus with its shell of perinucleolar DNA. The nature of the nucleolusassociated DNA, the targeting mechanism, and the cellular function of this subset of genomic DNA are not known. In the present study we report for the first time the highresolution analysis of a nuclear compartment by sequencing, microarray analysis, and single-cell analysis. We have characterized the nucleolus-associated DNA on sequence level and by 3D microscopy and have determined common elements and the molecular function of this compartment.

repetitive rDNA, and on centromeres of several chromosomes. The addition of the repetitive Cot1 competitor DNA suppresses binding of the naDNA probe to various chromosomal regions, but not to rDNA-containing nucleolar organiser regions (NORs). The result clearly demonstrates that rDNA, moreover pericentomeric

Figure 1. Genome-wide analyis of nucleolus-associated DNA. (A) Experimental strategy. (B) 2D FISH analysis of nucleolus-associated DNA on human female lymphocyte metaphase spreads in the absence (-Cot1) or presence (+Cot1) of Cot1 competitor DNA. Arrows indicate chromosome 1 centromeres, arrowheads indicate p-arms of acrocentric chromosomes. doi:10.1371/journal.pgen.1000889.g001

PLoS Genetics | www.plosgenetics.org

2

March 2010 | Volume 6 | Issue 3 | e1000889


Nucleolus Genomics

Figure 2. Genomic and size distribution of NADs. (A) Distribution of NADs together with satellite repeats along human chromosomes. Note that the p-arms of the five acrocentric chromosomes (13, 14, 15, 21 and 22) were not analysed because they are not assembled in the hg18 genome build. NADs are labeled with red, satellite repeats with deep blue, centromeres with yellow and chromosomes with light blue (B) Histogram of NAD sizes; median = 749 kb; a total of 97 NADs were identified. doi:10.1371/journal.pgen.1000889.g002

tissue-specific manner. This phenomenon suggests that these large chromosomal regions may change their sub-nuclear position with regard to their transcriptional activity. In addition, both immunoglobulin and OR genes exhibit monoallelic expression [23,24]; therefore, nucleoli may be involved in this type of gene regulation. Though, this has to be tested for each individual gene in specific model systems. Besides the response to other organisms and odour perception, additional biological processes and molecular functions are specifically associated with genes localised in the vicinity of the nucleolus, including tissue development and embryo implantation. (Figure S4 and S5 and Table S3). Carcinoembryonic antigen cell adhesion molecule (CEACAM) genes and pregnancy-specific glycoprotein (PSG) gene clusters, whose protein products regulate implantation, were also found next to and within NADs, respectively. Additionally, a large number (119) of small nucleolar RNA (snoRNA) genes were identified within one NAD on chromosome 15. However, this association may be explained by the close proximity of this cluster to the rDNA repeats (distance of 5 Mb). RNA genes located within NADs were characterized using the datasets of the ‘RepeatMasker’ and ‘RNA Genes’ databases of the Genome Browser. Both analyses show that 5S and tRNA genes, both of which are transcribed by RNA polymerase III, are specifically enriched in NADs but not in LADs. In contrast, other RNA genes are distributed with a similar frequency in NADs and the rest of the genome (Figure 3B). This finding proofs that RNA polymerase III-transcribed genes co-localise with nucleoli [25–27], which is the site of RNA polymerase I transcription. These observations suggest that spatial regulation may play a role in coordinated, well-tuned transcription of the RNA components of the protein translation machinery. Analysis of the repetitive elements showed a more than 10-fold enrichment of satellite repeats in NADs and depletion of SINE -

which contain rDNA and satellite repeats, are not represented in the hg18 genome build and, therefore, were not included in our analysis. In addition to the previously described pericentromeric locations, a significant number of the NADs (nine) localised in subtelomeric regions. Altogether, 97 chromosomal regions that are associated with nucleoli were identified, encompassing about 4% (126,217,765 bp) of the genome. Our study detected the most frequent nucleolus-associated chromosome domains using stringent cut-off parameters for domain definition (Figure 2A, Figure S2 and S3, Table S1, and Materials and Methods). After genome-wide NAD identification, sequence and chromatin features were compared to the whole genome and laminaassociated domains (LADs). LADs were recently determined by high-resolution mapping using DamID technology [21]. The size distribution (0.1–10 Mb) and median sequence length (749 kb) of NADs (Figure 2B) were similar to LADs (0.1–10 Mb, 553 kb) suggesting that the architectural units of chromosome organisation within the mammalian interphase nucleus are about 0.5–1 Mb in length. One thousand thirty-seven genes have been identified within NAD sequences according to the RefSeq gene database, 729 of which were non-redundant (Table S2). Surprisingly, certain gene families were frequently associated with the nucleoli, even though the overall gene density in NADs is about 20% lower than in the whole genome. We observed a 4-fold enrichment of zinc-finger (ZNF) genes in NADs compared to the genome. Olfactory receptor (OR) and defensin genes were enriched in both NADs and LADs, but the enrichment was far greater in NADs (Figure 3A). Moreover, two of the six large clusters of immunoglobulin and T-cell receptor genes [22] overlap with NADs, and one other is juxtaposed to a NAD (Figure S3). The gene families mentioned above have two common features: their members are in large gene clusters, and they are expressed in a PLoS Genetics | www.plosgenetics.org

3

March 2010 | Volume 6 | Issue 3 | e1000889


Nucleolus Genomics

Figure 3. Sequence and chromatin features of NADs. (A) RefSeq gene (B) RNA gene and (C) repeat statistics of NADs, genome and LADs. ZNF, OR and DEF indicate zinc finger, olfactory receptor and defensin gene families, respectively. RNA gene analysis of the ‘RepeatMasker’ and ‘RNA Genes’ tracks of the UCSC Genome Browser are shown on the left and right, respectively. (D) Chromatin features of NADs. Enrichment of functionally characterised repressive histone marks H3K27Me3, H3K9Me3 and H4K20Me3 in NADs are shown on the left, whereas depletion of the active histone mark H3K4Me1 is shown on the right. Genome, NADs and LADs values are labeled uniformly in (A–D) with black, red and white, respectively. The complete analysis is summarised in Table S5 and S6. (E) NADs and their typical genomic features on chromosome 19. Brown rectangle indicates the centromere. Abbreviations: UR (Universita¨t Regensburg) NADs – nucleolus-associated chromatin domains identified in this study, PolI pseudo – pseudogenes of RNA polymerase I transcribed rRNA genes, OR – olfactory receptor genes, ZNF – zinc finger genes, tRNA – transfer RNA genes (and pseudogenes) transcribed by RNA polymerase III, NKI (Nederlands Kanker Instituut) LADs–lamin-associated chromosome domains identified in the Tig3 cell line [21]. doi:10.1371/journal.pgen.1000889.g003

PLoS Genetics | www.plosgenetics.org

4

March 2010 | Volume 6 | Issue 3 | e1000889


Nucleolus Genomics

especially MIR–repeats (Figure 3C). We next performed a detailed quantitative analysis of all major satellite repeat subclasses located within NADs. (Figure S6). Our results demonstrate that the major building blocks of NADs are the alpha-, beta- and (GAATG)n/ (CATTC)n-satellite repeats, whereas other types of satellite repeats (e.g. MSR1, D20S16, SATR2) were depleted. These data confirm and extend previous studies [28,29] that describe nucleolar association of satellite repeats, but do not analyse them in detail. Taken together with the fact that D4Z4 macrosatellite repeats are located on the short arms of acrocentric chromosomes [30] and that ‘RepeatMasker’ does not contain information about low copy number repeats (e.g., segmental duplications or macrosatellites), we extended our investigations to such repetitive elements and showed that these genomic features are enriched in NADs (Figure S3 and Table S4). The presence of low-copy number repeats in NADs underlie the difficulties of alignment-based localisation of naDNA sequences within the genome: segmental duplications and major satellites will be mapped to more than one region [31,32], thus the nucleolar association of chromosome regions containing such sequences has to be confirmed by neighbouring sequences or in 3D FISH experiments. Enrichment of satellites and segmental duplications in NADs may also explain the assignment of several domains to chromosome Y even though HeLa cells are derived from a female. The Y chromosome has been shown to co-localise with nucleoli in the interphase nucleus [29,33], indicating that such low-copy number repeats are maybe involved in nucleolar targeting. The detailed map of nucleolus-associated chromosomal regions and genomic features enriched in NADs is shown in Figure 3E for chromosome 19. The complete set of data is shown in Figure S3 and Table S5. In order to reveal specific chromatin patterns enriched within the nucleolus-associated chromatin domains, we used the genomewide maps of histone modifications [34–36]. Multiple repressive histone marks were specifically enriched, whereas the active histone mark H3K4Me1 was significantly depleted in NADs. As mirrored by the enrichment of repressive histone marks, we observed the reduced global gene expression in NADs (Figure 3D and Table S6). These findings imply that NADs tend to form large inactive chromatin domains in the interphase nucleus. However, nucleolus-associated inactive chromatin differs markedly from lamina-associated inactive chromatin in the kind of repetitive elements and the gene-associated biological processes, suggesting that multiple domains of functionally distinct inactive chromatin exist within the nucleus. Furthermore, the presence of the highly expressed classes of 5S RNA and tRNA genes in nucleolusassociated chromatin indicates that the perinucleolar region is not exclusively transcriptionally silent. We used 3D immuno-FISH to confirm whether NADs revealed by the high-throughput methods co-localise with nucleoli. Nucleo li were stained with an a-B23/nucleophosmin antibody, and we have chosen 11 genomic loci that were analysed by appropriate BAC clones. Target, negative and positive control regions were selected from different chromosomes (Table S7, Figure S7, and Materials and Methods). The pericentromeric Xq11.1 region and the 5S rDNA cluster at 1q42.13 served as positive controls [26,37]. The combination of microarray and high-throughput sequencing analysis revealed a high-fidelity list of nucleolus-associated DNA as all of our selected NADs were more frequently associated with nucleoli of HeLa cells than the negative controls. To prove whether the nucleolar association of these chromosomal regions is a cell type specific feature or it is a general property in human cells, IMR90 embryonic lung fibroblasts were analysed. In contrast to HeLa, IMR90 cells possess diploid karyotype and they are not immortal. Except the 5S rDNA cluster on chromosome 1, all PLoS Genetics | www.plosgenetics.org

selected regions showed similar levels of nucleolar association in IMR90 and HeLa cells (Figure 4A and Figure S8), suggesting that the nucleolar targeting of certain chromosomal regions is a common feature in human cells. We next addressed the function of transcription in DNA targeting to the nucleolus by monitoring nucleolus association of selected chromosomal domains upon transcriptional inhibition. We used a-amanitin to block transcription by RNA polymerases II and III, whereas the synthesis of the 47S rRNA precursor was repressed by the addition of actinomycin D. We found that the specific inhibition of any of the RNA polymerases results in spatial reorganization of the nucleolus-associated domains (Figure S9 and Table S7), which indicates that the nucleolus forms a functional unit together with the associated perinucleolar chromatin. However, the concomitant partial disruption of nucleolar structures [38] makes the interpretation of such experiments difficult. In addition to localisation studies of single chromosomal regions, three typical features of the perinucleolar chromatin were visualised. To this end, five-colour immunofluorescence experiments were performed, which allowed direct comparison of the signal distributions of centromere, H3K27Me3 and active RNA polymerase II localisations in the same cell. RNA polymerase II transcription was depleted around nucleoli, furthermore the frequent association of H3K27Me3 and centromere signals with nucleoli reinforced the results of the bioinformatic analysis of NADs. Both HeLa and IMR90 cells showed similar localisation of these nuclear marks and the observed punctuated patterns suggest that functionally distinct chromatin domains co-exist around nucleoli (Figure 4B and 4C and Figure S10). We report here the mapping and characterization of nucleolusassociated chromatin domains in the human genome. Bioinformatics and statistical analyses reveal that the main building blocks of NADs are certain types of satellite repeats, tRNA and 5S RNA genes and members of the ZNF, OR, defensin and immunoglobulin gene families. Thus, our data suggest that certain type of satellite repeat sequences play an important role in establishing of NADs. Indeed, the internal scaffold of the nucleolus, the rDNA repeats were analysed only by qPCR (Figure S1), but not in our high-throughput studies for several reasons: i) they are not represented in the hg18 genome build, ii) repetitive sequences are not printed on microarrays, iii) the number of 454 sequencing reads depends on the GC content, which is very variable throughout the rDNA repeat (Figure S11). The findings of a recent publication indicate that centromeric nucleoprotein complexes may be targeted to the nucleolus via an alpha-satellite RNA-mediated mechanism [39], and address the importance of transcription in this process. These data suggest that transcription has a general regulatory role in maintaining the nuclear architecture around the nucleolus. The transcribed RNA may be bound by nucleolar RNA-binding proteins, which sequester NADs to the nucleolar periphery. On the other hand, our results imply that there is not a unique predictor sequence – in addition to certain satellite repeats, other elements e.g. tRNA genes, 5S RNA genes may be sufficient for the nucleolar targeting of individual chromatin domains. The aforementioned DNA elements, together with specific RNA molecules and scaffold proteins like UBF, may coordinate the (at least partial) selfassembly of the nucleolus with its shell. The principles of the assembly might be similar to the ones that were demonstrated recently for the pseudo-NORs [40,41] and for the Cajal-body [42], where single DNA, protein or RNA scaffolds were able to nucleate the formation of nuclear compartments. Further experiments are required to uncover the molecular steps of 5

March 2010 | Volume 6 | Issue 3 | e1000889


Nucleolus Genomics

Figure 4. 3D immuno–FISH analysis of nucleolus-associated chromatin domains. (A) Histograms show the frequency of the nucleolar localisation of NADs and control chromosomal regions detected by 3D FISH in HeLa cervix carcinoma and IMR90 diploid fibroblast cells. Percentage of nucleolus-associated alleles is shown on the left. Red diamond indicates target, green ones negative controls, whereas yellow diamond indicates the chromosome X pericentromeric and blue diamond the 5S cluster positive controls, respectively (see Table S7 for further BAC details). Single light optical sections of HeLa nuclei are shown on the right. BAC hybridization signals of RP11-90G23 target, RP5-915N17 positive control and RP11-81M8 negative control BACs are shown in green, nucleolar staining in red and DAPI counterstain in blue (scale bars: 5 mm). (B) a-H3K27Me3, a-centromere, a-active Pol II and a-B23/nucleophosmin immunostaining of HeLa and IMR90 cells. a-H3K27Me3, a-centromere and a-active Pol II signals are shown in green, nucleolar staining in red and DAPI counterstain in blue (scale bars: 5 mm). doi:10.1371/journal.pgen.1000889.g004

transcription-dependent nucleolar targeting of different groups of NADs and to identify the players in this process. The dynamics of nucleolus association during cell cycle and cell differentiation will PLoS Genetics | www.plosgenetics.org

be addressed in future studies. The functional organisation of the nuclear architecture is studied intensively [43–46] and the identification of NADs in the present work provides a basis for 6

March 2010 | Volume 6 | Issue 3 | e1000889


Nucleolus Genomics

bars in plots). 454 ‘Chip-Seq’ domains were selected as those areas with a running mean value above the 98% of the chromosome percentile. This arbitrary threshold fits well visual evaluation of 454 data as well as aCGH data. Finally, 454 regions were edited and border positions were curated manually. The significance of the 454-based NAD determinations was assessed empirically by comparing the number of reads in each of the detected NADs against the distribution of number of reads in 1000 randomly selected same-chromosome regions of the same size. The significance is then obtained as the quartile position of the NAD reads number in the random distribution. 454 and aCGH domains were merged in one single list of NADs. For merging, overlapping regions from both technologies were fused in one domain. Domain borders were defined following aCGH data unless the absence of array probes at merged borders suggested to use the 454 limits. Furthermore, adjacent regions separated by less than 0.1 Mb were joined to single domains.

the better understanding of the role of nucleoli in the spatial organisation of the human genome.

Materials and Methods Population average–based analyses HeLa cervix carcinoma cells were cross-linked with 1% formaldehyde and nucleoli were isolated as described [47]. rDNA content of equal amounts of naDNA and genomic DNA was quantified in real-time PCR reactions. Oligonucleotide sequences: Hr132F: 59CCTGCTGTTCTCTCGCGC, Hr155P: 59FAM-AGCGTCCCGACTCCCGGTGC-TAMRA, Hr198R: 59GGTCAGAGACCCGGACCC; Hr9776F: 59GCCACTTTTGGTAAGCAGAACTG, Hr9802P: 59FAM-CTGCGGGATGAACCGAACGCC-TAMRA, Hr9840R: 59CATCGGGCGCCTTAACC. Numbers indicate rDNA (GenBank Acc. No U13369) position relative to the transcriptional start site. Two rDNA regions were measured in technical triplicates from two biological replicate experiments. UBF and laminA/C protein levels were monitored with the sc-9131 and sc-20681 antibodies (Santa Cruz Biotechnology), respectively. naDNA was isolated and subjected to 454 sequencing (MWGBiotech) and microarray analysis on HG18 CGH 385K WG Tiling v1.0 platform (Nimblegen). Genomic features of NADs were analysed using the UCSC Table Browser (http://genome. ucsc.edu/cgi-bin/hgTables) and chromatin features using the Ensembl Database (http://www.ensembl.org) and the GSE12889 NCBI GEO dataset. Genomic features were visualised using Galaxy (http://galaxy.psu.edu/) and the UCSC Genome Browser (http://genome.ucsc.edu/). All analyses were performed on the hg18 genome build. Biological processes and molecular functions associated with NAD-located genes were analysed by using FatiGO [48]. Array CGH, 454 sequencing and subsequent data analysis were performed as follows: naDNA samples from two biological replicate experiments were subjected to microarray analysis on HG18 CGH 385K WG Tiling v1.0 platform. Hybridisation and pre-processing of hybridisation signals were performed at Nimblegen. For each of the samples, regions of increased intensity measurements were considered to be relevant if their mean value was greater than the 85 percentile of the sample distribution at 0.1 Mb running window size. Only the intersection of relevant regions across the microarray replicas was considered as a NAD. High-throughput sequencing was performed using the Roche GS FLX system. One of the aCGH analysed naDNA samples was taken as template for sequencing. 454 sequence reads were quality filtered and automatically assembled into contigs with the Newbler Assembler software at MWG-Biotech. Contigs were matched against the human genome using BLAT. Repeat masked sequences were used both for 454 data and genome data. For matching a 95% of sequence identity and coverage was requested and a maximum gap size of 3 was permitted. Of the mapped reads, 88% had unique hits. 454 data was widely spread on the genome. Only a few regions had higher intensity, mainly around centromeres. For domain detection, 454 data was first transformed into a binary (1/0) signal indicating presence/absence of mapped reads at chromosome positions defined by 100 nts length segments situated at a 1000 nts inter-spacing. A running mean algorithm was run on these data with a window size of 100 (which implies an actual chromosome window size of 0.1 Mb), to identify chromosomal regions with higher abundance of 454 sequencing hits (red PLoS Genetics | www.plosgenetics.org

Data deposition Microarray data have been submitted to the ArrayExpress Database (http://www.ebi.ac.uk/microarray-as/ae/) under accession number E-MEXP-2403. 454 sequencing data have been submitted to the Sequence Read Archive (http://www.ncbi.nlm. nih.gov/Traces/sra/) under accession number SRA009887.3.

Single-cell experiments 2D FISH experiments were performed on HeLa and human female lymphocyte metaphase spreads according to standard protocols. naDNA was labelled without amplification. NAD target and control BACs were selected as follows: RP11-434B14 (Xq11.1; ‘X cen’) and RP5-915N17 (1q42.13; ‘5S’) were used as positive controls. Perinucleolar localisations of the X chromosome and the large 5S rDNA cluster on chromosome 1 were reported previously [26,37]. RP11-90G23 (8q21.2; ‘REXO1’) and RP11173M10 (13q21.1; ‘7SK’, encompassing a 7SK RNA gene) were selected based on 454 sequencing data. We tested in the latter case if smaller 454 signals, which have not identified NADs could also be associated with nucleoli. RP11-44B13 (19q13.12; ‘27ZNF’) – selected based on our microarray data - marks a chromosomal fragment in FISH experiments where 27 KRAB-ZNF genes are located. The KRAB-ZNF gene cluster at 19q13.12 represents a SUV39H1 and CBX1 binding region. Our 3D FISH results reveal spatial features of this locus, which was formerly characterized at the level of chromatin domain organisation [49]. RP11-89H10 (3p12.3; ‘FRG2C’) and RP11-413F20 (10q26.3; ‘FRG2B’) were selected from combined aCGH/454 and aCGH results respectively. Both chromosomal regions contain D4Z4 major satellite repeats which may have nucleolar targeting potential. RP11-89O2 (3p14.1; ‘FRG2C ctrl’) and RP11-123G19 (10q24.1; ‘FRG2B ctrl’) served as negative controls for the latter two targets. RP11-81M8 (19p13.3; ‘REXO1’) covers a large 2 Mb chromosome fragment. This region contains the REXO1 gene thus having similarity at the primary sequence level to the REXO1L target and serves as its negative control. The negative control of the ZNF gene cluster (RP11-1137G4; 19p13.3-19p13.2; ‘ZNF557’) contains a single ZNF gene. 3D immuno-FISH experiments were performed as described [50]. In localisation experiments a-B23/nucleophosmin (Sigma, B0556), a-H3K27Me3 (Upstate, 07-449), a-active Pol II (Covance, MMS-129R), a-centromere (Antibodies Inc., 15–134) and different fluorescence dye-conjugated secondary antibodies, furthermore BAC clones RP11-90G23, RP11-173M10, RP1144B13, RP11-89H10, RP11-413F20, RP11-81M8, RP5-915N17, RP11-1137G4, RP11-89O2, RP11-123G19 and RP11-434B14 7

March 2010 | Volume 6 | Issue 3 | e1000889


Nucleolus Genomics

to the genome was performed using the FatiGO strategy [48] included in the Babelomics suite (www.babelomics.org). Enrichment of different features is indicated in red. Statistical values are listed in Table S3. For better view use .300% zoom. Found at: doi:10.1371/journal.pgen.1000889.s004 (0.19 MB PDF)

were used on HeLa cervix carcinoma cells and IMR90 lung embryonic fibroblasts. HeLa cells were treated with 75 mg/ml or 300 mg/ml a-amanitin for 5 hours in order to inhibit RNA polymerase II or RNA polymerases II and III. RNA polymerase I mediated synthesis of the rRNA precursor was impaired by treatment of the cells with 50 ng/ml actinomycin D for 1 hour. Cells were fixed and 3D immuno-FISH experiments were performed. Confocal microscopy and image analysis was performed after 3D FISH experiments as follows: series of optical sections through 3D-preserved nuclei were collected using a Leica TCS SP5 confocal system equipped with a Plan Apo 636/1.4 NA oil immersion objective and a diode laser (excitation wave length 405 nm) for DAPI, an argon laser (488 nm) for FITC and Alexa 488, a DPSS laser (561 nm) for Cy3, a HeNe laser (594 nm) for Texas Red and a HeNe laser (633 nm) for Cy5. For each optical section, signals in different channels were collected sequentially. Stacks of 8-bit gray-scale images were obtained with z-step of 200 nm and pixel sizes 30–100 nm depending on experiment. The axial chromatic shift was corrected and corresponding RGBstacks, montages and maximum intensity projections were created using published ImageJ plugins [51]. Positions of FISH signals were assessed by visual inspection of RGB stacks using the ImageJ program.

Figure S5 Molecular functions associated with NAD-located RefSeq genes. Statistical analysis of feature enrichment compared to the genome was performed using the FatiGO strategy [48] included in the Babelomics suite (www.babelomics.org). Enrichment of different features is indicated in red. Statistical values are listed in Table S3. Found at: doi:10.1371/journal.pgen.1000889.s005 (0.18 MB PDF) Figure S6 Satellite repeats in NADs and naDNA. The upper panel shows the number of different satellite repeats located in NADs compared to the genomic values. Repeat counts of 454 sequence reads shown in the lower panel reveal other quantitative aspects of different satellite repeat constitution to naDNA. Notably, satellite repeats located on the p-arms of the five acrocentric chromosomes (13, 14, 15, 21, and 22) are not included in the NAD analysis, but they appear in the naDNA analysis. Stars indicate repeats of which substantial amount (30%–50%) is located on chromosome Y and thus missing from female HeLa cells. Found at: doi:10.1371/journal.pgen.1000889.s006 (0.13 MB PDF)

Supporting Information

Figure S7 2D FISH analysis of BAC clones on human female lymphocyte and HeLa metaphase spreads. Lymphocytes are shown on the left and HeLa on the right panels. DAPI counterstaining is shown in red, BAC hybridization in green. White arrowheads point to BAC signals. Chromosomal localisation was verified by using chromosome paints (not shown). ID codes, chromosomal locations and BACPAC ID numbers of the BACs are indicated. Genomic coordinates of all BACs are shown in Table S7, locations in Figure S3. All BAC clones delivered 2 signals in lymphocytes, but RP11-89H10. However, cross-reaction signals could be filtered since they were significantly less intense than the specific signals. BAC clones delivered 3 signals in HeLa except RP11-89H10, RP11-89O2, RP11-434B14 (2 signals) and RP11-173M10 (4 signals). Again, cross-reaction signals could be filtered in the case of RP11-89H10. Found at: doi:10.1371/journal.pgen.1000889.s007 (0.13 MB PDF)

Figure S1 Controls of nucleolus purification. Left panel: differential interference contrast (DIC) micrographs show formaldehyde cross-linked HeLa cells and isolated nucleoli. Right panel: UBF and laminA/C immunoblot controls of a nucleolus preparation. Lane 1 shows the input, 2 and 3 the supernatants of the two-step purification [47] and 4 the nucleolar fraction. 0.5% of each fraction was loaded. Quantitative PCR measurement illustrates the enrichment of ribosomal DNA in nucleolusassociated DNA (naDNA) compared to genomic DNA (gDNA). Mean and standard deviation values of two biological replicate experiments are shown. Found at: doi:10.1371/journal.pgen.1000889.s001 (0.04 MB PDF) Figure S2 Nucleolus association maps on all chromosomes detected with 454 sequencing and/or aCGH analysis. 454 and aCGH signals are marked by red and blue bars, 454 and aCGH detected NADs by red and blue rectangles, respectively. Found at: doi:10.1371/journal.pgen.1000889.s002 (1.73 MB JPG)

Figure S8 Frequency of nucleolar localisation of NADs and control chromosomal regions detected by 3D FISH in HeLa cervix carcinoma and IMR90 diploid fibroblast cells. Percentage of cells containing at least one nucleolar-localised allele is shown. The results complement the data shown in Figure 4A and summarised in Table S7. Found at: doi:10.1371/journal.pgen.1000889.s008 (0.03 MB PDF)

Figure S3 Linear map of NADs and their typical genomic features on the human genome. NADs and their selected, typical sequence features are shown on the map. BAC clones used in 3D FISH experiments are indicated on the top and LADs on the bottom over the Segmental Duplication track. Abbreviations: UR NADs - nucleolus-associated chromosome domains identified in this study, PolI pseudo - pseudogenes of RNA polymerase I transcribed rRNA genes, D4Z4 - D4Z4 major satellite repeats (see Table S4 for further information), OR - olfactory receptor genes, ZNF - zinc finger genes, DEF - defensin genes, 5S and tRNA - 5S rRNA and transfer RNA genes (and pseudogenes) transcribed by RNA polymerase III, NKI LADs - lamin-associated chromosome domains identified by Guelen et al., [21]. Immunoglobulin and Tcell receptor gene clusters are shown according to www.imgt.org [22]. For better view use .400% zoom. Segmental duplications are shown with the colour code identical to the UCSC Genome Browser (http://genome.ucsc.edu/). Found at: doi:10.1371/journal.pgen.1000889.s003 (0.92 MB PDF)

Figure S9 3D immuno-FISH analysis of NADs after inhibition of transcription. HeLa cells were treated with a-amanitin in order to inhibit RNA polymerase II (Pol II) or RNA polymerases II and III (Pol II+III). RNA polymerase I (Pol I) mediated synthesis of the rRNA precursor was impaired by treatment of the cells with actinomycin D. Histograms show the frequency of the nucleolar localisation of three chromosomal regions detected by the indicated BAC clones in 3D FISH experiments. Red, green and blue diamonds indicate target, negative control, and the 5S cluster positive control, respectively (see Table S7 for further BAC details). We used a-amanitin to block transcription by RNA polymerases II and III as described [Huang S, Deerinck TJ, Ellisman MH, Spector DL (1998) The perinucleolar compartment and transcription. J Cell Biol 143: 35–47.; Wang C, Politz JC, Pederson T, Huang S (2003) RNA polymerase III transcripts and the PTB protein are essential for the integrity of the perinucleolar

Figure S4 Biological processes associated with NAD-located RefSeq genes. Statistical analysis of feature enrichment compared PLoS Genetics | www.plosgenetics.org

8

March 2010 | Volume 6 | Issue 3 | e1000889


Nucleolus Genomics

of 454 sequence hits per NAD is shown as well. The 454-based NAD determination was tested in an experimental statistical test comparing the number of reads in each of the detected NADs against the distribution of number of reads in 1,000 randomly selected same-chromosome regions of the same size. The significance is then obtained as the quartile position of the NAD reads number in the random distribution. NADs that were analysed in 3D FISH experiments are highlighted in yellow. Found at: doi:10.1371/journal.pgen.1000889.s012 (0.03 MB XLS)

compartment. Mol Biol Cell 14: 2425–2435.], whereas the synthesis of the 47S rRNA precursor was repressed by the addition of actinomycin D as described in the related nucleolar proteome study [19] and in the Materials and Methods. The results show that the specific inhibition of any of the RNA polymerases results in spatial reorganisation of NADs, which indicates that the nucleolus forms a functional unit together with the associated perinucleolar chromatin. Notably, the structure of nucleoli is also partially disrupted after the indicated treatments [38] and thus the interpretation of such analyses is difficult. The results of these experiments are summarised in Table S7. Found at: doi:10.1371/journal.pgen.1000889.s009 (0.03 MB PDF)

Table S2 List of RefSeq genes located in NADs. Genes within NADs were identified with the UCSC Table Browser (RefSeq Genes Track, hg18 genome build). Note, that almost 30% of the genes are duplicated or even more amplified. Specific enrichment of different gene families in NADs is shown in Figure 3A. Found at: doi:10.1371/journal.pgen.1000889.s013 (0.39 MB XLS)

Figure S10 Quantitative immunofluorescence analysis of selected NAD features. a-H3K27Me3 and a-active Pol II immunostainings of HeLa and IMR90 cells were quantified around nucleoli by using the ImageJ software. After thresholding a-B23/ nucleophosmin signals (indicated in blue), mean fluorescence intensity values were measured in the first 250 nm shell (red) and the second 250 nm shell (green) of 12 HeLa cells (22 nucleoli) and 16 IMR90 cells (56 nucleoli). The mean fluorescence intensity values were then divided to estimate enrichment or depletion. At the border of the nucleolus active Pol II and H3K27me3 show a clearly different distribution (p,0.001, Student’s t-test). Enrichment and depletion of the two markers in individual shells are significant in all cases (at least at the level p,0.05). Error bars are 95% confidence intervals. Found at: doi:10.1371/journal.pgen.1000889.s010 (0.99 MB PDF)

Biological processes and molecular functions associated with NAD-located RefSeq genes. Statistical analysis of feature enrichment compared to the genome was performed using the FatiGO strategy [48] included in the Babelomics suite (www. babelomics.org). Results are summarised in Figure S3 and S4 as graphs. Found at: doi:10.1371/journal.pgen.1000889.s014 (0.02 MB XLS)

Table S3

Table S4 List of D4Z4 major satellite containing chromosomal regions of the hg18 genome build. BLAT search was performed using the HUMFSHD sequence (GenBank Accession: D38024) as query. Chromosomal regions with more than 10% (330 bp) homology were indicated on the NAD map (Figure S3). Found at: doi:10.1371/journal.pgen.1000889.s015 (0.02 MB XLS)

Ribosomal DNA in 454 sequence reads. The assembly of rDNA containing 454 sequence reads is shown in the upper part and the scheme of the rDNA repeat unit below (black arrows indicate the position and direction of individual reads). In total 3,231 rDNA containing DNA fragments were sequenced, of which 2,086 reads were assembled together with the rDNA repeat unit into a single sequence in a MacVector Assembly Project. The results clearly show that different regions were unequally represented in the deep sequencing data, which is probably due to the technical limitations of the method (i.e. emPCR-based amplification of fragments with different GC content is unequal). The negative correlation between the number of sequence reads and GC content can be easily visualized by comparing the assembly result with the GC content plot over the rDNA sequence (the plot was calculated with the EMBOSS Isochore program, http://www.ebi.ac.uk/Tools/emboss). The scheme of the rDNA repeat is shown at the bottom of the figure, 18S, 5.8S, 28S, and IGS mark the coding regions and the intergenic spacer of the human rDNA (GenBank AccNo: U13369), respectively; red and blue lollipops mark the transcriptional start and stop sites, respectively; ticks on the ruler indicate 1 kb distances. We would like to underline here again that the combination of two high-throughput methods, i.e. 454 and aCGH, allows to reduce technical problems, such as the bias in next-generation sequencing [Harismendy O, Ng PC, Strausberg RL, Wang X, Stockwell TB, et al. (2009) Evaluation of next generation sequencing platforms for population targeted sequencing studies. Genome Biol 10: R32.] and the lack of repetitive sequence information in the microarray-based method. Found at: doi:10.1371/journal.pgen.1000889.s011 (0.30 MB PDF) Figure S11

Table S5 Statistical analysis of sequence features of NADs. Sequence features in NADs, genome and LADs were extracted from the UCSC Table Browser. Fisher’s exact test was performed to assess the significance of feature enrichments and the p-values are indicated. One-sided Fisher’s exact test was applied to test enrichment of genes of the selected gene families in NADs over the genome values. Two-sided Fisher’s exact test was applied to test enrichment/depletion of RNA genes and repeat families in NADs over the genome values. The statistical analysis of the enrichment of satellite repeats and depletion of SINE and in particular MIR repeats in NADs resulted in p = 0, thus they are not listed in the table. Although the differences between the observed NAD and genomic frequencies of other repeat types (LINE, Alu, LTR, DNA; p,,0.001) were also significant, the absolute differences were in these cases smaller than for satellites and MIRs and thus it is less likely that the latter repeats could possess specific nucleolar targeting and/or anchoring potential. The results of gene, RNA gene and repeat content analyses are illustrated as graphs in Figure 3A-3C, respectively. The detailed analysis of satellite repeat classes is shown in Figure S6. Found at: doi:10.1371/journal.pgen.1000889.s016 (0.02 MB XLS) Table S6 Statistical analysis of chromatin features of NADs.

Chromatin regulatory features in NADs were extracted from Ensembl Functional Genomics (eFG) database using Ensembl Perl API (Ensembl 50). These data were obtained by ChIP-seq analysis of lymphocytes [34]. The numbers indicate sequence reads per Mb. Additionally, gene expression and H3K27Me3 occupancy data for Hela cells were obtained from the Gene Expression Omnibus Database (GSM323148, GSM323149, GSM325898;

List of NAD genomic coordinates (hg18 genome build) and features of their detection. Chromosomal positions and size of NADs is shown in the table. The method of the detection for each 97 NADs is also indicated: 41 NADs were detected with both microarray and high-throughput sequencing, 20 NADs only by using sequencing, and 36 NADs only on microarrays. The number Table S1

PLoS Genetics | www.plosgenetics.org

9

March 2010 | Volume 6 | Issue 3 | e1000889


Nucleolus Genomics

[35]). The numbers indicate here sequence length occupied by the H3K27Me3 histone mark per Mb and mean values of gene expression in arbitrary units. Enrichment of features was tested by comparing the distribution of feature counts in NADs against the genome mean value using a t-test statistics and adjusting p-values for multiple testing. Importantly, the analysis of HeLa H3K27Me3 and gene expression data reinforces the results obtained from lymphocytes. Genomic and NAD values of functionally characterised, significantly enriched or depleted chromatin marks are shown in Figure 3D. Found at: doi:10.1371/journal.pgen.1000889.s017 (0.05 MB XLS)

transcription inhibition experiments are summarised in the lower part of the table and illustrated in Figure S9. Found at: doi:10.1371/journal.pgen.1000889.s018 (0.02 MB XLS)

Acknowledgments We thank M. Cremer, B. Joffe, D. Ko¨hler, and H. Jahn-Henninger for helpful discussions and technical help. AN dedicates his work to the memory of He´di.

Author Contributions Conceived and designed the experiments: AN TC GL. Performed the experiments: AN BP IS. Analyzed the data: AN AC JSL IM DM. Contributed reagents/materials/analysis tools: AN TC JD GL. Wrote the paper: AN.

Table S7 Summary of 3D FISH experiments. BAC locations,

allele and cell counts, furthermore nucleolus association frequencies in HeLa and IMR90 cells are shown. The results of

References 24. Pernis B, Chiappino G, Kelus AS, Gell PG (1965) Cellular localization of immunoglobulins with different allotypic specificities in rabbit lymphoid tissues. J Exp Med 122: 853–876. 25. Haeusler RA, Engelke DR (2006) Spatial organization of transcription by RNA polymerase III. Nucleic Acids Res 34: 4826–4836. 26. Matera AG, Frey MR, Margelot K, Wolin SL (1995) A perinucleolar compartment contains several RNA polymerase III transcripts as well as the polypyrimidine tract-binding protein, hnRNP I. J Cell Biol 129: 1181–1193. 27. Thompson M, Haeusler RA, Good PD, Engelke DR (2003) Nucleolar clustering of dispersed tRNA genes. Science 302: 1399–1401. 28. McStay B, Grummt I (2008) The epigenetics of rRNA genes: from molecular to chromosome biology. Annu Rev Cell Dev Biol 24: 131–157. 29. Stahl A, Hartung M, Vagner-Capodano AM, Fouet C (1976) Chromosomal constitution of nucleolus-associated chromatin in man. Hum Genet 35: 27–34. 30. Lyle R, Wright TJ, Clark LN, Hewitt JE (1995) The FSHD-associated repeat, D4Z4, is a member of a dispersed family of homeobox-containing repeats, subsets of which are clustered on the short arms of the acrocentric chromosomes. Genomics 28: 389–397. 31. Bailey JA, Yavor AM, Massa HF, Trask BJ, Eichler EE (2001) Segmental duplications: organization and impact within the current human genome project assembly. Genome Res 11: 1005–1017. 32. Sharp AJ, Locke DP, McGrath SD, Cheng Z, Bailey JA, et al. (2005) Segmental duplications and copy-number variation in the human genome. Am J Hum Genet 77: 78–88. 33. Bobrow M, Pearson PL, Collacott HE (1971) Para-nucleolar position of the human Y chromosome in interphase nuclei. Nature 232: 556–557. 34. Barski A, Cuddapah S, Cui K, Roh TY, Schones DE, et al. (2007) Highresolution profiling of histone methylations in the human genome. Cell 129: 823–837. 35. Cuddapah S, Jothi R, Schones DE, Roh TY, Cui K, et al. (2009) Global analysis of the insulator binding protein CTCF in chromatin barrier regions reveals demarcation of active and repressive domains. Genome Res 19: 24–32. 36. Wang Z, Zang C, Rosenfeld JA, Schones DE, Barski A, et al. (2008) Combinatorial patterns of histone acetylations and methylations in the human genome. Nat Genet 40: 897–903. 37. Bourgeois CA, Laquerriere F, Hemon D, Hubert J, Bouteille M (1985) New data on the in-situ position of the inactive X chromosome in the interphase nucleus of human fibroblasts. Hum Genet 69: 122–129. 38. Haaf T, Ward DC (1996) Inhibition of RNA polymerase II transcription causes chromatin decondensation, loss of nucleolar structure, and dispersion of chromosomal domains. Exp Cell Res 224: 163–173. 39. Wong LH, Brettingham-Moore KH, Chan L, Quach JM, Anderson MA, et al. (2007) Centromere RNA is a key component for the assembly of nucleoproteins at the nucleolus and centromere. Genome Res 17: 1146–1160. 40. Prieto JL, McStay B (2007) Recruitment of factors linking transcription and processing of pre-rRNA to NOR chromatin is UBF-dependent and occurs independent of transcription in human cells. Genes Dev 21: 2041–2054. 41. Prieto JL, McStay B (2008) Pseudo-NORs: a novel model for studying nucleoli. Biochim Biophys Acta 1783: 2116–2123. 42. Kaiser TE, Intine RV, Dundr M (2008) De novo formation of a subnuclear body. Science 322: 1713–1717. 43. Lanctot C, Cheutin T, Cremer M, Cavalli G, Cremer T (2007) Dynamic genome architecture in the nuclear space: regulation of gene expression in three dimensions. Nat Rev Genet 8: 104–115. 44. Sexton T, Schober H, Fraser P, Gasser SM (2007) Gene regulation through nuclear organization. Nat Struct Mol Biol 14: 1049–1055. 45. Takizawa T, Meaburn KJ, Misteli T (2008) The meaning of gene positioning. Cell 135: 9–13. 46. Zhao R, Bodnar MS, Spector DL (2009) Nuclear neighborhoods and gene expression. Curr Opin Genet Dev 19: 172–179.

1. Drygin D, Siddiqui-Jain A, O’Brien S, Schwaebe M, Lin A, et al. (2009) Anticancer Activity of CX-3543: A Direct Inhibitor of rRNA Biogenesis. Cancer Res. 2. Boisvert FM, van Koningsbruggen S, Navascues J, Lamond AI (2007) The multifunctional nucleolus. Nat Rev Mol Cell Biol 8: 574–585. 3. Mayer C, Grummt I (2005) Cellular stress and nucleolar function. Cell Cycle 4: 1036–1038. 4. Olson MO, Hingorani K, Szebeni A (2002) Conventional and nonconventional roles of the nucleolus. Int Rev Cytol 219: 199–266. 5. Sirri V, Urcuqui-Inchima S, Roussel P, Hernandez-Verdun D (2008) Nucleolus: the fascinating nuclear body. Histochem Cell Biol 129: 13–31. 6. McKeown PC, Shaw PJ (2009) Chromatin: linking structure and function in the nucleolus. Chromosoma 118: 11–23. 7. Tschochner H, Hurt E (2003) Pre-ribosomes on the road from the nucleolus to the cytoplasm. Trends Cell Biol 13: 255–263. 8. Chubb JR, Boyle S, Perry P, Bickmore WA (2002) Chromatin motion is constrained by association with nuclear compartments in human cells. Curr Biol 12: 439–445. 9. Hiscox JA (2002) The nucleolus–a gateway to viral infection? Arch Virol 147: 1077–1089. 10. Hiscox JA (2007) RNA viruses: hijacking the dynamic nucleolus. Nat Rev Microbiol 5: 119–127. 11. Marciniak RA, Lombard DB, Johnson FB, Guarente L (1998) Nucleolar localization of the Werner syndrome protein in human cells. Proc Natl Acad Sci U S A 95: 6887–6892. 12. Tamanini F, Kirkpatrick LL, Schonkeren J, van Unen L, Bontekoe C, et al. (2000) The fragile X-related proteins FXR1P and FXR2P contain a functional nucleolar-targeting signal equivalent to the HIV-1 regulatory proteins. Hum Mol Genet 9: 1487–1493. 13. Willemsen R, Bontekoe C, Tamanini F, Galjaard H, Hoogeveen A, et al. (1996) Association of FMRP with ribosomal precursor particles in the nucleolus. Biochem Biophys Res Commun 225: 27–33. 14. Isaac C, Marsh KL, Paznekas WA, Dixon J, Dixon MJ, et al. (2000) Characterization of the nucleolar gene product, treacle, in Treacher Collins syndrome. Mol Biol Cell 11: 3061–3071. 15. Yankiwski V, Marciniak RA, Guarente L, Neff NF (2000) Nuclear structure in normal and Bloom syndrome cells. Proc Natl Acad Sci U S A 97: 5214–5219. 16. Woo LL, Futami K, Shimamoto A, Furuichi Y, Frank KM (2006) The Rothmund-Thomson gene product RECQL4 localizes to the nucleolus in response to oxidative stress. Exp Cell Res 312: 3443–3457. 17. Heiss NS, Girod A, Salowsky R, Wiemann S, Pepperkok R, et al. (1999) Dyskerin localizes to the nucleolus and its mislocalization is unlikely to play a role in the pathogenesis of dyskeratosis congenita. Hum Mol Genet 8: 2515–2524. 18. Lipton JM, Ellis SR (2009) Diamond Blackfan anemia 2008–2009: broadening the scope of ribosome biogenesis disorders. Curr Opin Pediatr. 19. Andersen JS, Lam YW, Leung AK, Ong SE, Lyon CE, et al. (2005) Nucleolar proteome dynamics. Nature 433: 77–83. 20. Croft JA, Bridger JM, Boyle S, Perry P, Teague P, et al. (1999) Differences in the localization and morphology of chromosomes in the human nucleus. J Cell Biol 145: 1119–1131. 21. Guelen L, Pagie L, Brasset E, Meuleman W, Faza MB, et al. (2008) Domain organization of human chromosomes revealed by mapping of nuclear lamina interactions. Nature 453: 948–951. 22. Lefranc MP, Giudicelli V, Ginestoux C, Jabado-Michaloud J, Folch G, et al. (2009) IMGT, the international ImMunoGeneTics information system. Nucleic Acids Res 37: D1006–1012. 23. Chess A, Simon I, Cedar H, Axel R (1994) Allelic inactivation regulates olfactory receptor gene expression. Cell 78: 823–834.

PLoS Genetics | www.plosgenetics.org

10

March 2010 | Volume 6 | Issue 3 | e1000889


Nucleolus Genomics

49. Vogel MJ, Guelen L, de Wit E, Peric-Hupkes D, Loden M, et al. (2006) Human heterochromatin proteins form large domains containing KRAB-ZNF genes. Genome Res 16: 1493–1504. 50. Cremer M, Grasser F, Lanctot C, Muller S, Neusser M, et al. (2008) Multicolor 3D Fluorescence In Situ Hybridization for Imaging Interphase Chromosomes. Methods Mol Biol 463: 205–239. 51. Walter J, Joffe B, Bolzer A, Albiez H, Benedetti PA, et al. (2006) Towards many colors in FISH on 3D-preserved interphase nuclei. Cytogenet Genome Res 114: 367–378.

47. Sullivan GJ, Bridger JM, Cuthbert AP, Newbold RF, Bickmore WA, et al. (2001) Human acrocentric chromosomes with transcriptionally silent nucleolar organizer regions associate with nucleoli. Embo J 20: 2867–2874. 48. Al-Shahrour F, Minguez P, Tarraga J, Medina I, Alloza E, et al. (2007) FatiGO +: a functional profiling tool for genomic data. Integration of functional annotation, regulatory motifs and interaction data with microarray experiments. Nucleic Acids Res 35: W91–96.

PLoS Genetics | www.plosgenetics.org

11

March 2010 | Volume 6 | Issue 3 | e1000889


Nuclear Pore Proteins Nup153 and Megator Define Transcriptionally Active Regions in the Drosophila Genome Juan M. Vaquerizas1., Ritsuko Suyama2., Jop Kind2., Kota Miura3, Nicholas M. Luscombe1,2"* , Asifa Akhtar2,4"* 1 European Bioinformatics Institute, Cambridge, United Kingdom, 2 Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany, 3 Centre for Molecular and Cellular Imaging, European Molecular Biology Laboratory, Heidelberg, Germany, 4 Laboratory of Chromatin Regulation, Max Planck Institute of Immunobiology, Freiburg, Germany

Abstract Transcriptional regulation is one of the most important processes for modulating gene expression. Though much of this control is attributed to transcription factors, histones, and associated enzymes, it is increasingly apparent that the spatial organization of chromosomes within the nucleus has a profound effect on transcriptional activity. Studies in yeast indicate that the nuclear pore complex might promote transcription by recruiting chromatin to the nuclear periphery. In higher eukaryotes, however, it is not known whether such regulation has global significance. Here we establish nucleoporins as a major class of global regulators for gene expression in Drosophila melanogaster. Using chromatin-immunoprecipitation combined with microarray hybridisation, we show that Nup153 and Megator (Mtor) bind to 25% of the genome in continuous domains extending 10 kb to 500 kb. These Nucleoporin-Associated Regions (NARs) are dominated by markers for active transcription, including high RNA polymerase II occupancy and histone H4K16 acetylation. RNAi–mediated knockdown of Nup153 alters the expression of ,5,700 genes, with a pronounced down-regulatory effect within NARs. We find that nucleoporins play a central role in coordinating dosage compensation—an organism-wide process involving the doubling of expression of the male X chromosome. NARs are enriched on the male X chromosome and occupy 75% of this chromosome. Furthermore, Nup153-depletion abolishes the normal function of the male-specific dosage compensation complex. Finally, by extensive 3D imaging, we demonstrate that NARs contribute to gene expression control irrespective of their sub-nuclear localization. Therefore, we suggest that NAR–binding is used for chromosomal organization that enables gene expression control. Citation: Vaquerizas JM, Suyama R, Kind J, Miura K, Luscombe NM, et al. (2010) Nuclear Pore Proteins Nup153 and Megator Define Transcriptionally Active Regions in the Drosophila Genome. PLoS Genet 6(2): e1000846. doi:10.1371/journal.pgen.1000846 Editor: Wolf Reik, The Babraham Institute, United Kingdom Received November 10, 2009; Accepted January 14, 2010; Published February 12, 2010 Copyright: ß 2010 Vaquerizas et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: This work was supported by DFG SPP1129 and the EU funded FP7 Epigenome project. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. * E-mail: akhtar@immunbio.mpg.de (AA); luscombe@ebi.ac.uk (NML) . These authors contributed equally to this work. " These authors are joint senior authors on this work.

on the genome, so controlling transcriptional initiation. Despite the importance of these cis- and trans-acting factors on the local chromosomal environment and the transcription of nearby genes, it has become increasingly clear that they explain just one level at which chromatin is regulated [3,4]. The eukaryotic genome is spatially distributed in a highly organised manner, with entire chromosomal regions localising to well-defined sub-nuclear positions [5]. This organisation has a profound effect on chromatin accessibility and transcriptional activity on a genome-wide level [6–8]. For instance, chromosomal regions at the nuclear envelope tend to form closed heterochromatin, a structure that is generally indicative of transcriptional repression [9]. Genomic studies in Drosophila melanogaster and humans established that lamins—proteins lining the nuclear membrane [10]—are major contributors to sub-nuclear localisation and gene regulation [11,12]. Comparisons of binding profiles

Introduction The spatial organisation of DNA, both at the nucleotide and chromosomal levels, allows efficient storage of genetic information inside the nucleus. However, DNA-dependent processes such as transcription, require the chromosomal structure to be modified in order to allow access to this information. The regulation of chromatin accessibility is an intensely studied subject [1,2]. Molecular and genomic investigations have examined how nucleotide sequences and ATP-dependent chromatinremodelling enzymes specify the locations for nucleosomalbinding, and how histone-modifying enzymes modulate the stability of histone-nucleic acid interactions. These enzymes are recruited to precise genomic loci with the aid of sequence-specific DNA-binding transcription factors. In turn, particular histone modifications influence transcription factor-binding to target sites PLoS Genetics | www.plosgenetics.org

1

February 2010 | Volume 6 | Issue 2 | e1000846


Nucleoporins Bind Active Chromatin

of the Drosophila genome in large domains spanning 10–500 kb in size. These regions—which we term nucleoporin associated regions (NARs)—contain large numbers of highly expressed genes, and are enriched for markers of active transcription including RNA polymerase-binding and histone H4 lysine 16 acetylation. Additionally, we reveal a remarkably high density of NARs on the male X chromosome, which correlate extremely well with the binding pattern of the dosage compensation complex. Finally, we demonstrate that chromosomal regions bound by these nucleoporins are composed of peripheral as well as non-peripheral pools of these proteins but interestingly the X chromosomal target regions are preferentially localised closer to the nuclear periphery. In summary, we firmly establish nucleoporins as a major class of chromatin-binding proteins in higher eukaryotes, with a general role in transcriptional regulation and three-dimensional chromosomal organisation. Finally we show for the first time, the importance of nucleoporin-binding not only as a mechanism for transcriptional control, but also in maintaining a complex organism-level biological system namely dosage compensation.

Author Summary The eukaryotic genome is spatially distributed in a highly organized manner, with chromosomal regions localizing to well-defined sub-nuclear positions. This organization could have a profound effect on chromatin accessibility and transcriptional activity on a genome-wide level. Using high-resolution, genome-wide, chromatin-binding profiles we show that the nuclear pore components Nup153 and Megator bind to quarter of the Drosophila genome in form of chromosomal domains. These domains represent active regions of the genome. Interestingly, comparison of male and female cells revealed enrichment of these domains on the male X chromosome, which represents an exceptionally active chromosome that is under dosage compensation control to equalize gene expression due to differences in X chromosome number between males and females. Based on extensive 3D image analysis, we show that these chromosomal domains are contributed by both peripheral as well as intranuclear pool of these proteins. We suggest that chromosomal organization by nucleoporins could contribute to global gene expression control.

Results Nup153 and Mtor bind chromatin in a genome-wide fashion

with gene expression data and histone marker information showed that chromosomal regions containing dense lamin-binding were transcriptionally repressed. Although the nuclear periphery has been primarily associated with repression, recent evidence has also suggested a role for membrane components in transcriptional activation [9,13–16]. The nuclear pore complex is a large structure comprising about 30 protein subunits, and it is the primary channel through which macromolecules traverse the nuclear envelope [17]. Interestingly, investigations in Saccharomyces cerevisiae identified subunits of the nuclear pore complex that preferentially bound transcriptionally active genes [18]. Moreover, several target loci such as GAL2 and INO1 were found to relocate from the interior to the periphery upon activation [13], although there were exceptions to this behaviour [19–22]. Thus, it is becoming increasingly clear that nuclear periphery components can have both positive and negative influence on gene regulation. Since there are differences in the composition of the nuclear envelope—such as the lack of lamins—it is important to also study the contribution of nuclear envelope components in gene regulation in higher organisms [9,17,23–25]. So far just one study has explored the global interactions of nucleoporin subunit Nup93 with human chromosomes 5, 7 and 16 [26]; the publication reported only a low density of binding sites, and their influence on gene regulation was inconclusive. Recently, we revealed a biochemical association between nucleoporins and the dosage compensation apparatus in higher eukaryotes including humans [27]. In Drosophila, the Male Specific Lethal (MSL) complex offsets the imbalance in the number of sex chromosomes in males and females by doubling the expression of genes on the male X chromosome [28,29]. By purifying enzymatically active MOF complexes, we identified interactions with the nucleoporins Nup153 and Megator (Mtor). Strikingly, depletion of either subunit resulted in the loss of dosage compensation in male cells. Therefore, our work suggested a vital role for nucleoporins in promoting transcriptional activation on a large-scale. Here, we present the first genome-wide study of nucleoporinbinding in a higher eukaryote. Using chromatin immunoprecipitation followed by hybridisation to high resolution tiling microarrays, we show that Nup153 and Mtor interact with 25% PLoS Genetics | www.plosgenetics.org

We produced DNA-binding profiles for nuclear pore components Mtor and Nup153 in Drosophila male SL-2 and female KC cell lines using chromatin immunoprecipitation followed by hybridisation to Affymetrix tiling arrays [30,31] (Figure 1). Raw data were processed as in Kind et al (2008) to minimise falsepositive signals from aberrant array probes (Figure S1) [32]. The ChIP-chip profiles for the two proteins strongly correlate, indicating they bind to similar locations throughout the genome (r = 0.77 and 0.88 for SL-2 and KC cells respectively; Figure 1D, Figure S4). We confirmed the reproducibility of results by performing three biological replicates for each condition (r = 0.73), and we validated binding at 18 control genes by realtime PCR in triplicate (Figure S2). Both Mtor and Nup153 exhibit extensive binding across the whole genome, and together they bind to 42% of the Drosophila genome (calculated as a fraction of base-pairs covered with twofold cut-off). Thus nucleoporins represent a new class of global chromatin-binding proteins for higher eukaryotes.

Nucleoporin-binding occurs in large chromosomal domains Visual inspection of the ChIP-chip profiles reveals that Nup153 and Mtor interact with the genome in a manner not observed for traditional transcription factors (Figure 1B and 1C) [33]. Instead of associating with discrete loci, nucleoporins bind extended chromosomal regions that alternate between domains of highdensity binding with those of low occupancy. In order to analyse the visual observations in a statistically rigorous fashion, we quantified binding that takes place within a 10 kb sliding window that was scanned along the genome (see Materials and Methods). Windows containing more than 70% binding (as a proportion of array probes with positive binding signal) were classified as Nucleoporin Associated Regions (NARs), and neighbouring windows reaching this threshold were grouped together as continuous NARs. The detection method is robust: the 70% threshold ensures that no NARs are found when binding sites are randomly distributed across the genome and we identify very similar sets of NARs for windows ranging 5 kb to 500 kb in size. Moreover, application of the domain-finding approach described 2

February 2010 | Volume 6 | Issue 2 | e1000846


Nucleoporins Bind Active Chromatin

Figure 1. Nup153 and Megator bind the Drosophila genome on a large scale. (A) Karyotype representation of the Drosophila genome; the upper track depicts the occurrence of high-density nucleoporin-binding in SL-2 cells and the lower track shows the location of annotated genes. Termed Nucleoporin Associated Regions (NARs), high-density binding occurs across 25% of the genome and there is particularly high occupancy on the male X chromosome. (B) Magnified view of Nup153 and Mtor-binding on chromosome 3L. For each nucleoporin, the upper track displays the processed ChIP/input profile and the lower track colours the sections identified as NARs. Note that Nup153 and Mtor show very similar patterns of binding. (C) Magnified view of nucleoporin-binding and NAR occurrence on chromosome X. There is much denser binding on this chromosome compared with autosomes. (D) Smoothed scatter plot displaying the ChIP/input binding ratios for Nup153 and Mtor (r = 0.77). (E) Barplot representing the overlap in NARs defined by Nup153 and Mtor binding profiles. (F) Histogram of Nup153 and Mtor NAR length distributions. doi:10.1371/journal.pgen.1000846.g001

PLoS Genetics | www.plosgenetics.org

3

February 2010 | Volume 6 | Issue 2 | e1000846


Nucleoporins Bind Active Chromatin

Therefore, we explored the impact of NARs on transcriptional regulation by examining the activity of genes encoded within these regions (Figure 2; Tables S1, S2). We measured gene expression levels using Affymetrix GeneChips (see Materials and Methods). Using the present-absence calls defined by the MAS5.0 algorithm [34], we detected the expression of 6,478 and 6,219 genes in SL-2 and Kc cells respectively. These genes are preferentially located within NARs: 63% of genes inside NARs are expressed compared with just 40% outside, indicating a significantly elevated transcriptional activity in the former (p-value ,2.2e216). This observation is supported by data quantifying RNA polymerase II-occupancy (Figure 2; Tables S1, S2); by mapping publicly available ChIP-chip data [35], we find the Pol II-binding is highly enriched at the promoters of genes inside NARs compared with those outside (p-value ,2.2e216). Recent publications demonstrated that histone modifications, MOF acetyltransferase- and lamin-binding are robust genomewide indicators of transcriptional activity. In both SL-2 and KC cells, acetylated histone H4 lysine (H4K16Ac) and MOF-binding [32]—strong markers for active transcription—are extremely prominent within NARs (Figure 2; Tables S1, S2; p-value

by Guelen et al [11] returns over 80% agreement with our method (in terms of base-pairs classified as NARs). There is considerable NAR-occurrence (Figure 1A–1C); in male SL-2 cells, a total of 1,384 NARs cover a quarter of the entire Drosophila genome (25Mb and 29Mb for Nup153 and Mtor respectively) and in female Kc cells 1,865 NARs occupy a similar proportion of the genome (33Mb and 35Mb for Nup153 and Mtor respectively; Figure S3). Most domains range in size from 10 kb to 100 kb, although some even extend to over 500 kb (Figure 1F, Figure S4). Most nucleotide positions within NARs are occupied by both Nup153 and Mtor. Moreover, even where the overlap is not perfect, NARs tend to occur in similar genomic loci (Figure 1E; Figure 1B chromosomal positions 560,000–600,000). Most importantly, NARs occur in gene-rich areas that encompass over 4,700 protein-coding genes whose activities might be affected by nucleoporin-binding.

Nucleoporin-binding demarcates actively transcribed chromosomal regions A direct relationship between nucleoporin-binding and gene expression has not been established so far in higher eukaryotes.

Gene legend Present Up−regulated Down−regulated

Chr.2L

NARs Nup153 NARs Mtor NARs

Active marks H4K16Ac MOF Pol II Gene expression

Nup153 RNAi

Repressive marks Lamin H3K27me3 3.0

3.2

3.4

3.6

3.8

4.0

Chromosomal location (Mb)

Figure 2. NARs define transcriptionally active regions of the genome. Genome-track view of 1Mb section on chromosome 2L. NARs are enriched for transcribed genes compared with non-NARs (gene expression track; green shading), and a large proportion of genes are down-regulated upon Nup153-depletion (Nup153 RNAi track; red shading). NARs also align with markers of a transcriptionally active chromatin structure (H4K14Ac, MOF and PolII tracks; grey shading), but exclude markers for inactive chromatin (lamin, H3K27me3; grey shading). doi:10.1371/journal.pgen.1000846.g002

PLoS Genetics | www.plosgenetics.org

4

February 2010 | Volume 6 | Issue 2 | e1000846


Nucleoporins Bind Active Chromatin

,2.2610216). In contrast, histone H3 lysine 27 tri-methylation [36] and lamin-binding [12]—markers of transcriptional repression—are enriched outside NARs (Figure 2, Figure S5; Tables S1, S2; p-value ,2.2e216). Finally, we confirmed a causal link between nucleoporinbinding and transcriptional regulation by measuring gene expression levels following RNAi-mediated knock-down of Nup153 (Figure 2, Figure S7; Tables S1, S2). The depletion results in large and wide-spread transcriptional changes in cells collected after seven days: 5,684 genes 240% of Drosophila genes represented on the array—are differentially expressed in SL-2 cells (p-value ,0.05). Moreover, there is a large enrichment of down-regulated genes within NARs (29% of all genes; 40% of ‘present’ genes) compared with non-NARs (19% of all genes; pvalue ,2.2e216). We obtain similar enrichments for cells collected five days after RNAi-treatment, and also upon Mtor-depletion (data not shown). These observations strongly indicate that nucleoporin-binding promotes a high-level of transcriptional activity, which may be due to the formation of an open chromatin environment.

nucleoporin, Nup50 does not influence MSL1 and MOFlocalisation and binding (Figure S9; data not shown). Moreover, the observations are not due to an effect on MSL protein concentrations or defects in the RNA export pathway [27]: we previously showed that MSL levels remain unaffected in Nup153 and Mtor-depleted cells; and impairment of the major export pathways through NFX1-depletion does not disrupt the localisation of the MSL complex to the X chromosomes.

Spatial localisation of NARs versus non–NARs in the nucleus Although nucleoporins are primarily located at the nuclear periphery, some display dynamic association with the nuclear pore complex [37], and it remains unclear whether nucleoporinchromatin interactions would affect transcription at the periphery or within the nucleoplasm. Therefore, we assessed the spatial localisation of different chromosomal regions within the nucleus using three-dimensional imaging of Fluorescence In Situ Hybridisation (3D-FISH) in male and female cells (Figure 4). We selected 26 chromosomal regions of average length 15–20 kb for analysis (Table S3), comprising 18 NAR (targets T1-18) and 8 non-NAR loci (targets N1-8). An independent lamin-bound locus (target L105) was used as a positive control representing a region previously shown to localise at the nuclear periphery [12]. First we checked the localisation of Nup153 and Mtor themselves (Figure S6). Immunostaining of SL-2 cells and salivary glands from male larvae confirm that both proteins predominantly reside in the nuclear periphery, although we also detected some staining within the nucleus. This is consistent with earlier reports that these proteins are dynamic components of the nuclear pore complex, with the capacity to shuttle between different sub-nuclear locations [25,37]. Next, we used DAPI and lamin protein-immunostaining to assess the nuclear localisation of our target loci. We display a selection of images in Figure 4A: the lamin protein in green defines the nuclear boundary, the DAPI in blue the distribution of genomic DNA, and the FISH signal in red specifies the position of the target locus. In order to account for cell-to-cell variation in localisation that results from the dynamic behaviour of chromatin, we measured the distance between the FISH signal and nuclear boundary for a large number of samples (44,n,91). Size differences between nuclei were normalised by representing distances as a percentage of the nuclear radius. In Figure 4B, we show the expected distribution of distances for a simulated locus situated at the periphery; for a FISH signal with 30% radius, we find that most measurements lie between 0% and 30% of the distance to the centre of the nucleus. In contrast simulations for a signal positioned halfway between the periphery and the centre results in a distinct, more symmetrically shaped distribution, with most measurements falling between 20% and 60% of the distance to the centre (Figure 4C; Figures S10, S11; Videos S1, S2, S3, S4). The lamin-bound L105 locus displays a distribution that is heavily skewed towards the periphery (Figure 4D); however the profile is broader than the simulation, signifying that the locus is present at the interior of the nucleus at least part of the time. On the other hand, target N2 resembles that of the non-peripheral simulation (Figure 4E), albeit with a broader distribution, which indicates that the locus predominantly resides in the interior. Since both loci are NAR-independent, they were assigned as in vivo controls representing peripheral and non-peripheral localisation. Many NAR-target distributions show almost perfect overlap with L105, demonstrating that they are preferentially situated at the periphery (Figure 4F–4G; see Materials and Methods); interestingly however a subset of NAR loci displays distributions

NARs are enriched on the male X chromosome One of the most important manifestations of gene expression control in higher eukaryotes is dosage compensation for different number of sex chromosomes between the two sexes. In Drosophila—in which females have two X chromosomes but males possess only a single X—the dosage compensation complex offsets the imbalance in gene content by doubling the expression of the male X chromosome. Thus, the chromosome represents an outstanding example of an exceptionally highly transcribed genomic region. In order to explore the association of Nup153 and Mtor with the dosage compensation complex further, we compared the patterns of nucleoporin-binding in male SL-2 and female Kc cells (Figure 1A, Figure 3A–3D, Figure S3). There is a dramatic difference between the two sexes: in females, NARs are evenly distributed throughout the entire genome with only a 1.2-fold difference in % NAR occupancy between chromosome X (7.4Mb and 33% for Nup153; 8.0Mb and 36% for Mtor) and autosomes (26.0Mb and 27% for Nup153; 27.1Mb and 28% for Mtor); but in males, NARs are overwhelmingly biased towards the X chromosome (14.9Mb and 67% for Nup153; 16.6Mb and 75% for Mtor) compared with the autosomes (9.7Mb and 10% for Nup153; 12.0Mb and 12% for Mtor) with a 6-fold difference in occupancy. Further, domains on the male X chromosome (median length = 62Kb, 94Kb for Nup153 and Mtor respectively) are much longer than those found on any other chromosomes (median length = 22Kb for Nup153 and Mtor in male autosomes, ,35Kb for female autosomes and X chromosome). Having established that the nucleoporins are enriched on the male X chromosome, we explored the association with the dosage compensation system further. Recently, we demonstrated that the members of the dosage compensation complex—MSL1, MSL3 and MOF—preferentially bind to the male X chromosome [32]. A comparison of this previously published dataset with our current analysis shows that NARs on the male X chromosome coincide very well with the binding sites of the dosage compensation complex (Figure 3E). We also tested the effects of Nup153-depletion on MSL1 and MOF-binding to 10 known target loci using chromatin-immunoprecipitation followed by qPCR. X-chromosomal binding is severely reduced for both proteins (Figure 3F), and the additional binding to autosomal targets is lost for MOF (Figure S8). The effects are clearly specific to Nup153, as depleting another PLoS Genetics | www.plosgenetics.org

5

February 2010 | Volume 6 | Issue 2 | e1000846


Nucleoporins Bind Active Chromatin

Figure 3. Male X chromosome is especially enriched for NARs. Percentages of NAR occupancy on male and female autosomes and X chromosome for (A) Nup153 and (C) Mtor. In males, NARs are particularly enriched on the X chromosome compared with autosomes, whereas NARs occur evenly throughout in females. NAR length distributions for (B) Nup153 and (D) Mtor. NARs are much longer on the male X chromosome. (E) Overlap between NARs and MSL1-, MSL3- and MOF-binding; numbers represent gene counts. (F) Effect of Nup153-depletion on MSL1- (red shading) and MOF-binding (grey shading) to four X-chromosomal target loci. DNA prepared from cells treated with EGFP (control) or Nup153 dsRNA was immunoprecipitated and analysed by qPCR using primers for the beginning (P1), middle (P2) and end (P3) of genes. Error bars represent the standard deviation in measurements from three replicate experiments. Recovered DNA is shown as a percentage of input DNA. doi:10.1371/journal.pgen.1000846.g003

PLoS Genetics | www.plosgenetics.org

6

February 2010 | Volume 6 | Issue 2 | e1000846


Nucleoporins Bind Active Chromatin

Figure 4. Nup153 and Mtor define NARs both at the periphery and the interior of the nucleus. (A) Representative images of single confocal sections of nuclei containing the FISH signal (red) over DAPI (blue) and immunostained lamin (green). Target genomic regions include a lamin-bound gene (L105), NAR (T4, T15, T7, T11, T9, T13) and non-NAR loci (N1, N2). Probability density plots show the distribution of distance measurements between the FISH signal and the closest point on the nuclear boundary. Simulated nuclei show the ideal distributions for FISH targets located at the (B) periphery and (C) interior. Distances range from 0 at the boundary and 1.0 at the centroid of the nucleus. The grey background represents the theoretical 30% limit for a peripherally localised FISH signal. Observed distributions of in vivo controls for (D) peripherally localised L105 and (E) non-peripherally localised N2; the broad spread compared with simulations indicate that the loci display dynamic behaviour in their positioning within the nucleus. (F, G) Predominantly peripheral loci (T4, T15) have distributions that are similar to L105 (shown in yellow), whereas (H, I) predominantly non-peripheral loci (N1, T11) have very different distributions. Aggregate distributions for all NAR targets on (J) the male X chromosome and (L) autosomes, and all non-NAR targets on (K) the male X and (M) autosomes. Targets on the X chromosome are peripherally localised compared with autosomal ones. doi:10.1371/journal.pgen.1000846.g004

that are indicative of non-peripheral localisation (Figure 4I). For non-NARs, targets such as N1 display good overlap with the negative control N2 (Figure 4H), but some are found at the periphery. It is clear, therefore, that many targets regions tested here do not conform to the behaviour expected from NPCbinding. In fact, we find that NARs from chromosome X tend to reside at the periphery (6 out of 10 targets; Table S4), whereas only a small number of autosomal NARs do so (1 out of 8; Table S4). This is reflected in the aggregate distributions, in which Xchromosomal loci display the characteristic skewed profiles compared with autosomal regions (Figure 4J–4L). Among nonNARs (Figure 4K–4M), autosomal loci are invariably nonperipheral, whereas the X chromosomal targets display a tendency for peripheral localisation; the positioning of the latter is probably influenced by neighbouring NARs as there is such a large amount PLoS Genetics | www.plosgenetics.org

of binding on the X chromosome. For comparison, peripheral localisation of the X chromosome is absent in female Kc cells (data not shown). Thus in striking contrast to prior expectations, we reveal that interior as well as peripheral populations of nucleoporins bind chromatin and mediate transcriptional activity at NARs. Furthermore, interactions with the X chromosome promotes peripheral localisation of the chromosome—most likely as a result of the overwhelming amount of binding in males—but this is generally not the case for autosomes. Finally to confirm the influence of nucleoporins on localisation, we tested the effects of RNAi-mediated Nup153-knockdown for six loci: three peripheral X chromosomal NARs (T4, T5, T7), a nonperipheral X chromosomal NAR (T11), a non-peripheral autosomal NAR (T9) and the non-peripheral control (N2). For each we compared the distribution of Nup153-depleted samples against a mock EGFP RNAi-treatment (Figure 5, Figure S7). All 7

February 2010 | Volume 6 | Issue 2 | e1000846


Nucleoporins Bind Active Chromatin

Figure 5. Peripheral localisation is dependent on Nup153. Probability density distributions of distance measurements for mock treated (red) and Nup153-depleted cells (purple). Histograms depict the proportion of nuclei for which the FISH signal is located within the 30% distance threshold (DAPI in blue). (A-C) NAR targets on the male X chromosome (T4, T7, T5) relocalise to the interior upon treatment, indicating that peripheral localisation is dependent on Nup153. (D-F) NAR and non-NAR targets at the interior remain unaffected upon Nup153-depletion. doi:10.1371/journal.pgen.1000846.g005

organisation of chromosomes within the nucleus is increasingly considered to have a profound effect on chromatin structure and transcriptional activity [5]. In particular, studies in yeast indicate that members of the nuclear pore complex might promote transcription by recruiting chromatin to the nuclear periphery [14,18]. However, the importance of such regulation in higher eukaryotes has remained unresolved [26]. In this study, we established conclusively that nucleoporins play a central role in mediating transcriptional regulation in a complex, multicellular organism. For the first time in any higher eukaryote, we generated a genome-wide profile of nucleoporin-binding; contrary to preliminary observations, binding is widespread, occurring across 40% of the genome. Thus, we reveal that nucleoporins—Nup153 and Mtor in particular—represent a major new class of global chromatin-binding proteins. Intriguingly, these proteins interact with the genome differently to traditional transcription factors. Rather than associate with individual loci, nucleoporins bind continuous sections of chromo-

three peripheral targets on the X chromosome displace to a more intra-nuclear position upon loss of Nup153 (Figure 5A–5C; p-value ,0.05), but in contrast there was no significant change for any of the non-peripheral loci (Figure 5D–5F; p-value .0.05). These data suggest that the sub-nuclear positioning of peripheral NARs—specifically those on the male X—depends on the presence of Nup153, whereas the localisation of intra-nuclear loci is independent regardless of whether they are bound by nucleoporins.

Discussion The classical view of transcriptional regulation describes the interplay of transcription factors, histones and associated enzymes with DNA in order to recruit the transcriptional machinery to the appropriate genomic loci. However, it has become increasingly clear that these interactions explain only one level at which gene expression is controlled. At a genome-wide level, the spatial PLoS Genetics | www.plosgenetics.org

8

February 2010 | Volume 6 | Issue 2 | e1000846


Nucleoporins Bind Active Chromatin

at least 80% of all probes defined as domains by the Guelen approach.

somes at very high density. Termed NARs, these regions extend up to 500kb in length and occupy 25% of the entire Drosophila genome. Moreover, NARs are functionally important as they demarcate regions of open chromatin and transcriptional activity, which is lost on depletion of Nup153. It is significant that the male X chromosome—a prime example for hyper-transcription—is almost entirely occupied by NARs. Therefore, we suggest that Nup153 and Mtor may stimulate transcription by promoting the formation of an open chromatin environment. In dramatic contrast to expectations, nucleoporin-binding does not automatically lead to localisation at the nuclear periphery, though the male X chromosome is an exception in this regard. Since Nup153 and Mtor are known to be dynamic components of the nuclear pore complex, it appears likely that both peripheral and intra-nuclear pools of nucleoporins contribute to chromatinbinding. Given the dynamic nature of chromatin-localisation, it is also possible that NARs are located at the periphery in a very transient manner, and further developments in imaging techniques will help clarify this. Where NAR-formation and peripheral localisation do coincide however, Nup153 is necessary for sustained positioning. Chromosomal domains have been implicated in the formation of three-dimensional sub-nuclear structures to coordinate the expression of otherwise distant loci [38] such as the human betaglobin genes [39,40]. We speculate that NARs may indicate the genomic regions required for the assembly of these transcription factories on a very large scale. Within this context, the dynamic nature of Nup153 and Mtor is significant, as re-localisation of these proteins might allow a basis for global transcriptional control in response to cellular cues. Additionally, given the primary function of the nuclear pore complex in transporting macromolecules to and from the nucleus, Nup153 and Mtor may provide a means to couple transcriptional control with post-transcriptional events. We stress however that the mechanisms behind such processes are the subject of intense research activity and many controversies remain. Finally, the special link with dosage compensation confirms the importance of nucleoporin-binding not only as a molecular mechanism for transcriptional control, but also in maintaining a complex, organism-level biological system.

RNAi on cultured cells Nup153 and Nup50 were depleted as previously described in Mendjan et al [27]. Briefly, cells were incubated with dsRNA for five or seven days with a boost on day two. Cells were subsequently harvested for Western blot analysis, ChIP, gene expression profiling, or immunofluorescence experiments. Control experiments were performed using mock treatment (EGFP RNAi).

Gene expression profiling Gene expression was measured using Affymetrix Drosophila2 GeneChips in triplicate for each condition. Data analysis was performed using publicly available packages in the BioConductor Software Suite [43]. Raw .CEL files were processed using RMA [44] and probe-sets were mapped to genes using the annotation from the Ensembl database (v41) [45]. In control (EGFP-treated) cells, expressed genes were identified as those outputting MAS5.0 ‘present’ cells in all three replicates [34]. For comparisons of Nup153-depleted and mock-treated cells, differentially expressed genes were determined using the Limma package [46]; p-values were corrected for multiple-testing using FDR [47] and a significance threshold of p-value,0.05 was selected.

Overlap of NARs with markers for transcriptional activity We compared the overlap between NARs and genomic features. For ease of comparison, all data were mapped onto the Drosophila genome provided by the Ensembl database (v. 41) [45]. Accompanying each entry is the statistical significance of the difference in the amount of genomic feature found within NARs and non-NARs. (i) Histone H4 lysine K16 acetylation (H4K16Ac; p,2.2e216; ttest): processed ChIP-chip profiles obtained from Kind et al [32]. (ii) MOF-binding (p,2.2e216; Fisher test): processed ChIP-chip profiles obtained from Kind et al [32]. (iii) RNA PolII-occupancy (p,2.2e216; Fisher test): PolII-bound genes obtained from Muse et al [35]. For visualisation purposes in Figure 2, bound genes were represented as 1kb windows centred on the transcription start site. (iv) Gene density (p-value ,2.2e216; Wilcoxon test): number of genes as annotated by the Ensembl database within a 20kb sliding window with a 1 kb offset. (v) Expressed genes (p-value ,2.2e216; Fisher test): gene expression measured using Affymetrix Drosophila2 GeneChips as described above. (vi) Down-regulated genes upon Nup153-depletion (p-value ,2.2e216; Fisher test): differentially expressed genes in RNAi-treated cells compared with untreated cells as described above. (vii) Lamin-binding (p-value ,2.2e216; Fisher test): processed ChIP-chip data were obtained from Pickersgill et al [12]. Note that the study used low-resolution cDNA arrays, and therefore unlike the human study, the authors were unable to detect high-density lamin-associated domains. (viii) Histone H3 lysine 27 tri-methylation (H3K27me3; p-value ,2.2e216; Fisher test): processed ChIP-chip profiles obtained from Schwartz et al [36].

Materials and Methods ChIP–chip and qPCR analysis Chromatin immunoprecipitation combined with microarray hybridisation (ChIP-chip), and qPCR experiments were performed as described previously in Kind et al [32]. Primer sequences are provided in Text S1. Numerical data from Affymetrix Drosophila Tiling 2.0R Arrays (Dm35b_MR_v02) were processed as in Kind et al [32]. Briefly, array data were background corrected using GCRMA and quantile normalised [41]. Log2 (ChIP/input) ratios were calculated using the average from three replicate experiments. Log2 ratios were then smoothed by averaging the signal within a 500 bp window centred on each probe (Figure S1).

Identification of Nucleoporin Associated Regions (NARs) Fluorescent In Situ Hybridisation on cultured cells

Chromosomal regions with high densities of Nup153- and Mtor-binding were identified by sliding a 10 kb window along each chromosome, centred on the start position of each probe. NARs were defined as continuous chromosomal regions containing positive binding signal (ie, log2 ratio .0) for more than 70% of probes. We also implemented the two-stage domain-finding method described by Guelen et al [11]. Our method recovered PLoS Genetics | www.plosgenetics.org

DNA FISH on SL-2 cells was performed as previously described by Lanzuolo et al [48]. Briefly for DNA FISH 16106 cells were centrifuged, re-suspended in 0.4 ml of medium and placed for 30 min at room temperature on a poly-lysine-coated slide (10 mm diameter). After rinsing with PBS, the cells were fixed with 4% paraformaldehyde in PBT (PBS, 0.1% Tween 20) for 10 min at 9

February 2010 | Volume 6 | Issue 2 | e1000846


Nucleoporins Bind Active Chromatin

results were obtained when we used the centre of mass of the FISH signal as the reference point instead of the mean distances for individual voxels (data not shown). In total, we examined 1,712 nuclei (35–91 samples for each target locus; total 1,172 nuclei for NAR; total 540 nuclei for nonNARs). For a given target, we compiled all distance measurements from all relevant nuclei to produce a distribution of distances as shown in Figure 4 and Table S4. The lamin L105 and N2 non-NAR targets were selected as in vivo controls with representative distributions for peripheral and non-peripheral sub-nuclear localisation. We compared the localisation of each target locus by comparing its distance measurements against the L105 and N2 controls separately. Statistical significance was calculated using the Wilcoxon test, with a FDRcorrected threshold of p ,0.05. Briefly, a non-significant p-value (ie p-value .0.05) compared with the L105 distribution is indicative of peripheral localisation, whereas a non-significant pvalue (i.e. p-value .0.05) compared with the N2 distribution is indicative of non-peripheral localisation.

room temperature. Cells were then washed three times with PBT, incubated for 1 h at room temperature with RNAse A (100 mg/ml in PBT). After rinsing with PBS, cells were incubated with 0.5% Triton in PBS for 10 min at room temperature. Cells were rinsed again with PBS and incubated with 20% glycerol in PBS for 30 min at room temperature. Cells were then frozen in liquid nitrogen, thawed at room temperature and soaked in 20% glycerol in PBS, repeatedly four times. After washing the cells again with PBS three times, they were incubated for 5 min in 0.1N HCl, briefly rinsed in 2XSSC twice, and stored in 50% formamide, 2XSSC, 10% dextransulphate, pH 7.0. Fluorescent probes were prepared with the FISH Tag DNA Kit (Invitrogen, Carlsbad, CA), dissolved in the hybridization mixture (50% deionized formamide, 2XSSC, 10% dextransulphate, salmon sperm DNA at 0.5 mg/ ml), applied to cells and sealed under coverslips with rubber cement. Probe and cellular DNA were denatured simultaneously on a hot block at 78uC for 3 min. Hybridization was carried out in a humid atmosphere at 37uC for 1 d. After hybridization, slides were washed in 2XSSC three times for 5 min at 37uC, and in 0.1XSSC three times for 5 min at 45uC, rinsed in PBS twice and counter-stained with DAPI. For immuno-FISH, the following procedure is added after washing with 0.1XSSC at 45uC. Wash twice with 2XSSC 5 min each at RT. Blocking with (TNT buffer; 0.1M Tris-HCl pH 7.5, 0.15M NaCl, 5% BSA) for 1 h at RT. Anti-lamin antibody is incubated for overnight at 4uC in TNT buffer, wash with wash buffer three times for 5 min. Second antibody is applied in TNT buffer for 2–3 h at RT, wash with wash buffer (0.1M Tris-HCl pH 7.5, 0.15M NaCl), including DAPI staining as described above. Cells were mounted on the glass slide with FluoromountG (Southern Biotech. Birmingham, AL). Three-dimensional image stacks were taken with Leica SP5 confocal microscope (Leica Microsystems, Exton, PA) using an x63 oil immersion objective with a numerical aperture of 1.4, and zoom 3.260.2. To perform DNA FISH on target and non-target probes, approximately 15 kb region were chosen, except for the repeated sequence, in the genome and amplified by PCR from genomic DNA with 5–10 primers pairs, each covering around 0.5–3 kb. Primer sequences are available on request.

Accession numbers Microarray data are available in the ArrayExpress databaset [42] under accession numbers E-MEXP-2523 (gene expression data) and E-MEXP-2525 (ChIP-chip data).

Supporting Information Figure S1 Processing of ChIP-chip data and NAR determination for Nup153. All ChIP-chip assays were performed in triplicate. Raw data were GCRMA-normalised. Triplicates were averaged and binding ratios were calculated relative to average intensities from triplicates of 10% input DNA. Data were then smoothened by using averaging of intensities within a 500bp sliding window centred on each probe. We then calculated the density of positively probes in 10 Kb windows centred on each probe, and used a cut-off of 70% to determine Nucleoporin Associated Regions (NARs). Profiles of the different analysis steps are illustrated for a 200 Kb region of chromosome X in SL-2 cells: GCRMA-normalised intensities for individual probes across three biological replicates (light orange); mean intensity values of the three biological replicates for Nup153 binding (orange); GCRMAnormalised intensities and mean values for the input DNA control (light and dark grey); ratios of Nup153-binding and control mean intensity signals (light blue); smoothed ratios using a 500-bp sliding window centred on each probe (dark blue); density of positively bound probes in 10 Kb windows centred on each probe (solid black line) and 70% threshold for detection of NARs (dotted red line); Nup153 NARs (dark red boxes); FlyBase genes in the forward and reverse strand are represented in light grey; coordinates represent the position on the corresponding chromosome. A similar procedure was used to determine NARs in male and female samples for Nup153 and Mtor. Found at: doi:10.1371/journal.pgen.1000846.s001 (0.6 MB PDF)

Image analysis of FISH localisation To determine quantitatively the three-dimensional position of the FISH signal within the nucleus, we used the ImageJ software [49]. The nuclear envelope was initially defined by segmentation of the DAPI image using the automated Otsu thresholding algorithm. The boundary definition was then refined against the lamin-staining, flagging significant deviations between the two signals if necessary. Figure S10 shows a schematic diagram of the procedure. We also display a distribution of radii calculated for 62 nuclei, demonstrating that the DAPI and lamin signals provide very consistent definitions of the nuclear boundary. Segmented images were then stacked in order to recreate the threedimensional nucleus. Next we calculated the distances between the FISH signal and the nuclear boundary (Figure S10, S11, Videos S1, S2, S3, S4). The segmented three-dimensional images of the nucleus were converted into a three-dimensional distance map using the Local Thickness plug-in (http://www.optinav.com/Local_Thickness. htm). We thresholded the FISH images to identify voxels within the nucleus that corresponded to the FISH signal and we measured the distances between all such voxels and the closest point on the nuclear boundary. For each nucleus we calculated the mean distance, and then for each test locus, we use the set of mean distances for all nuclei to plot the distance distribution. Similar PLoS Genetics | www.plosgenetics.org

Figure S2 Validation of Nup153 and Mtor target and nontarget genes by ChIP-QPCR. Chromatin prepared from SL-2 cells was used for immunoprecipitation using Nup153 (blue) and Mtor (grey) antibodies. Recovered DNA (% Input) was analysed by QPCR using primers in the beginning (P1), middle (P2) and end (P3) of genes as shown. Error bars represent standard deviation obtained from three independent experiments. Found at: doi:10.1371/journal.pgen.1000846.s002 (0.04 MB PDF) Figure S3 Nup153 and Mtor NARs in Kc cells. (A) Karyotype representation of Nup153 and Mtor NARs across the genome in 10

February 2010 | Volume 6 | Issue 2 | e1000846


Nucleoporins Bind Active Chromatin

Kc cells. (B) Magnified view of a 1Mb region of chromosomal arm 2L. Tracks represent the smoothened binding ChIP/input ratio for Nup153 and Mtor (dark grey), the density of positively bound probes calculated in 10 Kb windows centred on each probe (solid grey line), and NARs (red boxes), for regions with a density of positively bound probes above 70%. (C) Magnified view of a 100 kb region in (B). Found at: doi:10.1371/journal.pgen.1000846.s003 (2.11 MB PDF)

bodies. Size markers (kDa) are indicated on the right side. (B) Cells treated with EGFP or Nup153 dsRNA were used for immunofluorescence confocal microscopy using antibodies against Nup153, Nup50, Mtor, MOF, and MSL1 (shown in red). Nup153, Mtor antibodies, and Hoechst were used for triple-immunostaining and pseudocolours were added using the ImageJ software. A similar strategy was used for MOF, MSL1, and Lamin triple immunostaining. Found at: doi:10.1371/journal.pgen.1000846.s009 (0.43 MB PDF)

Figure S4 Correlation between Nup153 and Mtor binding in Kc cells. (A) Smoothed scatter plot displaying the ChIP/input binding ratios for Nup153 and Mtor (Pearson r = 0.88). (B) Bar chart representing the overlap in NARs defined by Nup153 and Mtor binding profiles. (C) Histogram of Nup153 and Mtor NAR length distributions. Found at: doi:10.1371/journal.pgen.1000846.s004 (0.56 MB PDF)

Figure S10 Segmentation of DAPI image. (Top left) Nucleus labelled with lamin and stained with DAPI were used for the development of segmentation method (green = lamin, blue = DAPI). (Top centre) Lamin signal was thresholded and then reduced to single pixel rim (green). Detected rim was overlaid to the original lamin image. (Top right) The segmentation strategy were verified by measuring the deviation between DAPI segmented image (black/ white) and the lamin rim (green). Pink lines show how these deviations were measured. (Bottom) Probability density distributions of the mean radii of 62 individual nuclei calculated using the DAPI or lamin signals. The median radius for the DAPI segmented edge was 2.4560.33 mm and for the lamin signal was 2.4160.35 mm. Found at: doi:10.1371/journal.pgen.1000846.s010 (0.05 MB PDF)

Figure S5 H4K16Ac and H3K27me3 are mutually exclusive throughout the genome. (A) Detail view of H3K27me3 and H4K16Ac modifications in a 1 Mb region of chromosome X in SL-2 cells. H3K27me3 data were obtained from Schwartz et al (2006) [36] and H4K16Ac data were obtained from Kind et al (2008) [32]. For each modification, we used the cut-offs from the original publications to define significant signals. (B) Smoothed scatter plot of H4K16Ac and H3K27me3 modification intensity values. Only data points with significant intensity values are shown. Plot areas with high data density are shown in dark red; plot areas with low are density are shown in dark blue. Found at: doi:10.1371/journal.pgen.1000846.s005 (1.39 MB PDF)

Measurement of three-dimensional distance of FISH signals from the nuclear periphery. Three-dimensional brightestpoint projection images of a simulated nucleus showing (A) peripheral localisation and (B) non-peripheral localisation. Outlines of nuclear periphery in each z-slice (blue contours, DAPI channel) and FISH signal (red, FISH channel) are shown. Nucleus is rotated on the x-axis with 30 degree increments from top-left to bottom-right panel. Three-dimensional brightest-point projection images of real nuclei with NAR locus (C) T4 and (D) control locus N2. Bar = 5 mm. See also Videos S1-S4. Found at: doi:10.1371/journal.pgen.1000846.s011 (0.55 MB PDF) Figure S11

Figure S6 Immunostaining of Nup153 and Mtor in salivary

glands. Immunostaining of Nup153 and Mtor in salivary glands isolated from 3rd instar male larvae. Salivary glands were coimmunostained with either MSL1 antibody or pre-immune serum (Pre-Mtor, Pre-Nup153) and serum (Mtor and Nup153). Both Nup153 and Mtor show predominantly nuclear rim staining but there is also some diffuse staining within the nucleus. X chromosomal territory is observed with MSL1 staining. Found at: doi:10.1371/journal.pgen.1000846.s006 (0.23 MB PDF)

Table S1 Enrichment of active and repressive markers in NARs and non-NARs in SL-2 and Kc cells. Found at: doi:10.1371/journal.pgen.1000846.s012 (0.05 MB PDF) Table S2 Enrichment of H4K16Ac and gene density in NARs versus non-NARs for SL-2 and Kc cells. Found at: doi:10.1371/journal.pgen.1000846.s013 (0.05 MB PDF)

Figure S7 RNAi-mediated depletion of Nup153 in SL-2 cells. (A)

Whole extracts were obtained from cells treated with EGFP or Nup153 dsRNA for 0, 3, 5, or 7 days, and separated on SDS PAGE followed by western blot analysis using Nup153 and Tubulin antibodies. Size markers (kDa) are indicated on the right side. (B) Cells treated with EGFP or Nup153 dsRNA were used for immunofluorescence confocal microscopy. Nup153, Nup50, and Lamin antibodies were used for triple-immunostaining and pseudo colours were added using the ImageJ software. A similar strategy was used for MOF, MSL1 and Lamin triple immunostaining. Arrows indicate residual MSL1- or MOF-staining in Nup153-depleted cells. Found at: doi:10.1371/journal.pgen.1000846.s007 (0.75 MB PDF)

Table S3 Target (T) and non-target (N) regions used for FISH analysis. Start and end show the chromosomal localization coordinates according to release 3 of the Drosophila melanogaster genome (R5.11). Genes in each probe set are also indicated. Individual genes within these regions, which were further tested by Q-PCR in this study, are indicated in red. Found at: doi:10.1371/journal.pgen.1000846.s014 (0.08 MB PDF) Table S4 This table accompanies Figure 4. Chromosomal

location of the target and non-target regions is indicated. Total number of pixels and nuclei counted is also indicated as well as the statistical significance of each target or non-target region shown separately as well as average of each category. Found at: doi:10.1371/journal.pgen.1000846.s015 (0.06 MB PDF)

Figure S8 MOF-binding to autosomal promoters is affected in Nup153-depleted cells. Chromatin prepared from cells treated with EGFP (black) or Nup153 (grey) dsRNA was used for immunoprecipitation using MOF antibody. MOF-binding was scored on six autosomal target promoters. Recovered DNA was analysed by qPCR and is shown as percentage of input DNA (% Input). Error bars represent standard deviations obtained from three independent experiments. Found at: doi:10.1371/journal.pgen.1000846.s008 (0.33 MB PDF)

Text S1 Primer sequences for quantitative PCR; primer sequences for RNAi. Found at: doi:10.1371/journal.pgen.1000846.s016 (0.09 MB PDF) Video S1 3D projection movie of a simulated nucleus with FISH signal at nuclear periphery. Nuclear envelope is shown as blue contours and FISH signal is shown in red. Montages of the movies are shown in Figure S11. Found at: doi:10.1371/journal.pgen.1000846.s017 (0.73 MB MOV)

Figure S9 Control RNAi-mediated depletion of Nup50 in SL-2 cells. (A) Whole cells extracts were made from cells treated with EGFP or Nup50 dsRNA for 0, 5, or 7 days, and separated on SDS PAGE followed by western blot analysis using Nup50 and Tubulin antiPLoS Genetics | www.plosgenetics.org

11

February 2010 | Volume 6 | Issue 2 | e1000846


Nucleoporins Bind Active Chromatin

Peperkok for support and Maarten Fornerod and Martin Hetzer for communication prior to publication. We thank Frederick Bantignies and Patrick Meister for advice on FISH protocols and Sebastein Huet for advise on FISH analysis. We are grateful to Paul Bertone for contributing to earlier analysis of the RNAi gene expression data. We thank Yosef Gruenbaum for gift of Drosophila Lamin antibody. We thank Vladimir Benes, Tomi Ba¨hr-Ivacevic, and Jos de Graaf for microarray hybridization and Stefan Terjung and ALMF team and Leica microsystems for support. We are grateful to Christian Haering, Lars Steinmetz, Francois Spitz, and Jan Ellenberg for critical reading of the manuscript.

Video S2 3D projection movie of a simulated nucleus with FISH

signal located between the periphery and nuclear centre. Found at: doi:10.1371/journal.pgen.1000846.s018 (0.73 MB MOV) Video S3 3D projection movie of real nucleus with NAR locus T4 localised to the periphery. Found at: doi:10.1371/journal.pgen.1000846.s019 (0.62 MB MOV) Video S4 3D projection movie of real nucleus with control locus N2 localised at the interior. Found at: doi:10.1371/journal.pgen.1000846.s020 (0.83 MB MOV)

Author Contributions Conceived and designed the experiments: NML AA. Performed the experiments: JMV RS JK. Analyzed the data: JMV RS JK KM NML AA. Contributed reagents/materials/analysis tools: JMV RS JK KM NML AA. Wrote the paper: NML AA.

Acknowledgments We thank members of the laboratory for helpful discussions and critical reading of the manuscript. We thank Bas van Steensel and Rainer

References 24. Goldman RD, Shumaker DK, Erdos MR, Eriksson M, Goldman AE, et al. (2004) Accumulation of mutant lamin A causes progressive changes in nuclear architecture in Hutchinson-Gilford progeria syndrome. Proc Natl Acad Sci U S A 101: 8963–8968. 25. Paddy MR, Belmont AS, Saumweber H, Agard DA, Sedat JW (1990) Interphase nuclear envelope lamins form a discontinuous network that interacts with only a fraction of the chromatin in the nuclear periphery. Cell 62: 89–106. 26. Brown CR, Kennedy CJ, Delmar VA, Forbes DJ, Silver PA (2008) Global histone acetylation induces functional genomic reorganization at mammalian nuclear pore complexes. Genes Dev 22: 627–639. 27. Mendjan S, Taipale M, Kind J, Holz H, Gebhardt P, et al. (2006) Nuclear pore components are involved in the transcriptional regulation of dosage compensation in Drosophila. Mol Cell 21: 811–823. 28. Lucchesi JC, Kelly WG, Panning B (2005) Chromatin remodeling in dosage compensation. Annu Rev Genet 39: 615–651. 29. Straub T, Becker PB (2007) Dosage compensation: the beginning and end of generalization. Nat Rev Genet 8: 47–57. 30. Iyer VR, Horak CE, Scafe CS, Botstein D, Snyder M, et al. (2001) Genomic binding sites of the yeast cell-cycle transcription factors SBF and MBF. Nature 409: 533–538. 31. Ren B, Robert F, Wyrick JJ, Aparicio O, Jennings EG, et al. (2000) Genomewide location and function of DNA binding proteins. Science 290: 2306–2309. 32. Kind J, Vaquerizas JM, Gebhardt P, Gentzel M, Luscombe NM, et al. (2008) Genome-wide analysis reveals MOF as a key regulator of dosage compensation and gene expression in Drosophila. Cell 133: 813–828. 33. Farnham, PJ (2009) Insights from genomic profiling of transcription factors. Nat Rev Genet 10: 605–616. 34. Hubbell E, Liu WM, Mei R (2002) Robust estimators for expression analysis. Bioinformatics 18: 1585–1592. 35. Muse GW, Gilchrist DA, Nechaev S, Shah R, Parker JS, et al. (2007) RNA polymerase is poised for activation across the genome. Nat Genet 39: 1507–1511. 36. Schwartz YB, Kahn TG, Nix DA, Li XY, Bourgon R, et al. (2006) Genomewide analysis of Polycomb targets in Drosophila melanogaster. Nat Genet 38: 700–705. 37. Rabut G, Doye V, Ellenberg J (2004) Mapping the dynamic organization of the nuclear pore complex inside single living cells. Nat Cell Biol 6: 1114–1121. 38. Sutherland H, Bickmore WA (2009) Transcription factories: gene expression in unions? Nat Rev Genet 10: 457–466. 39. Osborne CS, Chakalova L, Brown KE, Carter D, Horton A, et al. (2004) Active genes dynamically colocalize to shared sites of ongoing transcription. Nat Genet 36: 1065–1071. 40. Simonis M, Klous P, Splinter E, Moshkin Y, Willemsen R, et al. (2006) Nuclear organization of active and inactive chromatin domains uncovered by chromosome conformation capture-on-chip (4C). Nat Genet 38: 1348–1354. 41. Wu Z, Irizarry RA, Gentleman R, Martinez-Murillo F, Spencer F (2004) A model-based background adjustment for oligonucleotide expression arrays. Journal of the American Statistical Association 99: 909–917. 42. Parkinson H, Kapushesky M, Kolesnikov N, Rustici G, Shojatalab M, et al. (2009) ArrayExpress update - from an archive of functional genomics experiments to the atlas of gene expression. Nucleic Acids Res 37: D868–D872. 43. Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, et al. (2004) Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 5: R80. 44. Irizarry RA, Bolstad BM, Collin F, Cope LM, Hobbs B, et al. (2003) Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res 31: e15. 45. Flicek P, Aken BL, Ballester B, Beal K, Bragin E, et al. (2010) Ensembl’s 10th year. Nucleic Acids Res 38: D557–D562.

1. Kouzarides T (2007) Chromatin modifications and their function. Cell 128: 693–705. 2. Li B, Carey M, Workman JL (2007) The role of chromatin during transcription. Cell 128: 707–719. 3. Clapier CR, Cairns BR (2009) The biology of chromatin remodelling complexes. Annu Rev Biochem 78: 273–304. 4. Suganuma T, Workman JL (2008) Crosstalk among histone modifications. Cell 135: 604–607. 5. Lanctoˆt C, Cheutin T, Cremer M, Cavalli G, Cremer T (2007) Dynamic genome architecture in the nuclear space: regulation of gene expression in three dimensions. Nat Rev Genet 8: 104–115. 6. Branco MR, Pombo A (2006) Intermingling of chromosome territories in interphase suggests role in translocations and transcription-dependent associations. PLoS Biol 4: e138. doi:10.1371/journal.pbio.0040138. 7. Branco MR, Pombo A (2007) Chromosome organization: new facts, new models. Trends Cell Biol 17: 127–134. 8. Cremer T, Cremer M, Dietzel S, Mu¨ller S, Solovei I, et al. (2006) Chromosome territories - a functional nuclear landscape. Curr Opin Cell Biol 18: 307–316. 9. Shaklai S, Amariglio N, Rechavi G, Simon AJ (2007) Gene silencing at the nuclear periphery. FEBS J 274: 1383–1392. 10. Dechat T, Pfleghaar K, Sengupta K, Shimi T, Shumaker DK, et al. (2008) Nuclear lamins: major factors in the structural organization and function of the nucleus and chromatin. Genes Dev 22: 832–853. 11. Guelen L, Pagie L, Brasset E, Meuleman W, Faza MB, et al. (2008) Domain organization of human chromosomes revealed by mapping of nuclear lamina interactions. Nature 453: 948–951. 12. Pickersgill H, Kalverda B, de Wit E, Talhout W, Fornerod M, et al. (2006) Characterization of the Drosophila melanogaster genome at the nuclear lamina. Nat Genet 38: 1005–1014. 13. Brickner JH, Walter P (2004) Gene recruitment of the activated INO1 locus to the nuclear membrane. PLoS Biol 2: e342. doi:10.1371/journal.pbio.0020342. 14. Brown CR, Silver PA (2007) Transcriptional regulation at the nuclear pore complex. Curr Opin Genet Dev 17: 100–106. 15. Cabal GG, Genovesio A, Rodriguez-Navarro S, Zimmer C, Gadal O, et al. (2006) SAGA interacting factors confine sub-diffusion of transcribed genes to the nuclear envelope. Nature 441: 770–773. 16. Taddei A, Van Houwe G, Hediger F, Kalck V, Cubizolles F, et al. (2006) Nuclear pore association confers optimal expression levels for an inducible yeast gene. Nature 441: 774–778. 17. Tran EJ, Wente SR (2006) Dynamic nuclear pore complexes: life on the edge. Cell 125: 1041–1053. 18. Casolari JM, Brown CR, Komili S, West J, Hieronymus H, et al. (2004) Genome-wide localization of the nuclear transport machinery couples transcriptional status and nuclear organization. Cell 117: 427–439. 19. Chambeyron S, Bickmore WA (2004) Chromatin decondensation and nuclear reorganization of the HoxB locus upon induction of transcription. Genes Dev 18: 1119–1130. 20. Ferreira J, Paolella G, Ramos C, Lamond AI (1997) Spatial organization of large-scale chromatin domains in the nucleus: a magnified view of single chromosome territories. J Cell Biol 139: 1597–1610. 21. Gilbert DM (2001) Nuclear position leaves its mark on replication timing. J Cell Biol 152: F11–F15. 22. Zink D, Amaral MD, Englmann A, Lang S, Clarke LA, et al. (2004) Transcription-dependent spatial arrangements of CFTR and adjacent genes in human cell nuclei. J Cell Biol 166: 815–825. 23. Akhtar A, Gasser SM (2007) The nuclear envelope and transcriptional control. Nat Rev Genet 8: 507–517.

PLoS Genetics | www.plosgenetics.org

12

February 2010 | Volume 6 | Issue 2 | e1000846


Nucleoporins Bind Active Chromatin

46. Smyth GK (2004) Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol 3: Article3. 47. Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B 57: 289–300.

PLoS Genetics | www.plosgenetics.org

48. Lanzuolo C, Roure V, Dekker J, Bantignies F, Orlando V (2007) Polycomb response elements mediate the formation of chromosome higher-order structures in the bithorax complex. Nat Cell Biol 9: 1167–1174. 49. Dougherty R, Kunzelmann KH (2007) Computing local thickness of 3D structures with ImageJ. Microscopy and Microanalysis 13: 1678–1679.

13

February 2010 | Volume 6 | Issue 2 | e1000846


Maternal Ethanol Consumption Alters the Epigenotype and the Phenotype of Offspring in a Mouse Model Nina Kaminen-Ahola1, Arttu Ahola1,2, Murat Maga3, Kylie-Ann Mallitt1, Paul Fahey1, Timothy C. Cox3, Emma Whitelaw1,4, Suyinn Chong1,4* 1 Division of Genetics and Population Health, Queensland Institute of Medical Research, Herston, Australia, 2 Department of Biological and Environmental Sciences, University of Helsinki, Helsinki, Finland, 3 Division of Craniofacial Medicine, Department of Pediatrics, University of Washington, Seattle, Washington, United States of America, 4 Griffith Medical Research College, Griffith University and the Queensland Institute of Medical Research, Herston, Australia

Abstract Recent studies have shown that exposure to some nutritional supplements and chemicals in utero can affect the epigenome of the developing mouse embryo, resulting in adult disease. Our hypothesis is that epigenetics is also involved in the gestational programming of adult phenotype by alcohol. We have developed a model of gestational ethanol exposure in the mouse based on maternal ad libitum ingestion of 10% (v/v) ethanol between gestational days 0.5–8.5 and observed changes in the expression of an epigenetically-sensitive allele, Agouti viable yellow (Avy), in the offspring. We found that exposure to ethanol increases the probability of transcriptional silencing at this locus, resulting in more mice with an agouticolored coat. As expected, transcriptional silencing correlated with hypermethylation at Avy. This demonstrates, for the first time, that ethanol can affect adult phenotype by altering the epigenotype of the early embryo. Interestingly, we also detected postnatal growth restriction and craniofacial dysmorphology reminiscent of fetal alcohol syndrome, in congenic a/ a siblings of the Avy mice. These findings suggest that moderate ethanol exposure in utero is capable of inducing changes in the expression of genes other than Avy, a conclusion supported by our genome-wide analysis of gene expression in these mice. In addition, offspring of female mice given free access to 10% (v/v) ethanol for four days per week for ten weeks prior to conception also showed increased transcriptional silencing of the Avy allele. Our work raises the possibility of a role for epigenetics in the etiology of fetal alcohol spectrum disorders, and it provides a mouse model that will be a useful resource in the continued efforts to understand the consequences of gestational alcohol exposure at the molecular level. Citation: Kaminen-Ahola N, Ahola A, Maga M, Mallitt K-A, Fahey P, et al. (2010) Maternal Ethanol Consumption Alters the Epigenotype and the Phenotype of Offspring in a Mouse Model. PLoS Genet 6(1): e1000811. doi:10.1371/journal.pgen.1000811 Editor: Jeannie T. Lee, Massachusetts General Hospital, Howard Hughes Medical Institute, United States of America Received June 22, 2009; Accepted December 10, 2009; Published January 15, 2010 Copyright: ß 2010 Kaminen-Ahola et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: This work was supported by an Australian Research Council (ARC) Discovery Project grant (DP0878192) to EW and SC. NKA received funding from the Sigrid Juselius Foundation, Academy of Finland, Finnish Alcohol Research Foundation, the Finnish Cultural Foundation and Arvo and Lea Ylppo Foundation. TCC was supported by the Seattle Childrens Hospital and the Murdoch Charitable Trust. EW is a National Health and Medical Research Council (NHMRC) Australia Fellow. Part of this work was done in conjunction with the Collaborative Initiative on Fetal Alcohol Spectrum Disorders (CIFASD), which is funded by grants from the National Institute on Alcohol and Alcohol Abuse (NIAAA), U24 AA014811. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. * E-mail: suyinn.chong@qimr.edu.au

end of the inserted IAP. Specifically, hypomethylation is associated with constitutive ectopic Agouti expression and a yellow coat, while hypermethylation correlates with cryptic promoter silencing and a pseudoagouti coat [9]. We have previously shown that DNA methylation at Avy is reprogrammed in early development at the same time that the rest of the genome is undergoing epigenetic reprogramming [10]. Alcohol consumption is widespread in our society, but it is also recognized as the leading preventable cause of birth defects and mental retardation [11,12]. High levels of alcohol consumption during pregnancy can result in fetal alcohol syndrome (FAS) which is characterized by prenatal and postnatal growth restriction, craniofacial dysmorphology and structural abnormalities of the central nervous system. The clinical features of FAS are variable and include a range of other birth defects, as well as educational and behavioral problems [13]. This syndrome is the most extreme form of a range of disorders that are known as fetal alcohol spectrum disorders (FASDs) [14]. Approximately 5% of the children of mothers who have drunk heavily during pregnancy have FAS [15], and studies have shown that the dose, time and

Introduction While it is well-recognized that gestational exposure to environmental triggers can lead to compromised fetal development and adult disease in humans [1], the underlying molecular mechanisms remain unknown. There is increasing evidence in animal models that environmental factors can affect gene expression via epigenetic modifications such as DNA methylation [2–6]. One way of detecting such events is to use reporters whose expression is closely linked to their epigenetic state. Such epigenetically sensitive alleles are also known as metastable epialleles, and the best known example in the mouse is Agouti viable yellow (MGI:1855930) or Avy [7]. Avy is a dominant mutation of the murine Agouti (A) locus, caused by the insertion of an intracisternal A-particle (IAP) retrotransposon upstream of the Agouti coding exons. The activity of Avy is variable among genetically identical mice, resulting in mice with a range of coat colors; from yellow to mottled to agouti (termed pseudoagouti) [8]. The expression of Avy is known to correlate with DNA methylation at a cryptic long terminal repeat (LTR) promoter located at the 39 PLoS Genetics | www.plosgenetics.org

1

January 2010 | Volume 6 | Issue 1 | e1000811


Epigenetics and Gestational Alcohol Exposure

reprogramming in the etiology of FASD and provides researchers with a relevant mouse model of the human disorder.

Author Summary In humans it has been known for some time that exposure to environmental insults during pregnancy can harm a developing fetus and have life-long effects on the individual’s health. A well known example is fetal alcohol syndrome, where the children of mothers that consume large amounts of alcohol during pregnancy exhibit growth retardation, changes to the shape and size of the skull, and central nervous system defects. At present the molecular events underlying fetal alcohol syndrome are unknown. We have developed a model of alcohol exposure in the mouse, in which the genetics and the environment can be strictly controlled. We find that chronic exposure of the fetus to a physiologically relevant amount of alcohol during the first half of pregnancy results in epigenetic changes at a sensitive reporter gene and produces fetal alcohol syndrome-like features in some mice. Our model is a useful tool to study the underlying causes of fetal alcohol syndrome, and our work raises the interesting possibility that the long-term physical effects of alcohol exposure during pregnancy are mediated by epigenetic changes established in the fetus and then faithfully remembered for a lifetime. In the future, such epigenetic changes could be used as markers for the preclinical diagnosis and treatment of fetal alcohol spectrum disorders.

Results In this study, Avy was used primarily as a sensitive reporter of epigenetic changes in response to maternal ethanol consumption. The C57BL/6J mouse is null (a) at the Agouti locus, so it has a black coat color. Avy is a gain-of-function, semi-dominant mutation and so the coat color of heterozygous (Avy/a) mice in the C57BL/6J background is a direct read out of Avy transcriptional activity and DNA methylation. The nature of the matings used in this study, an Avy/a male crossed with an a/a female, means that only 50% of the offspring will inherit the Avy allele and be useful for coat color phenotyping. The remaining (a/a) offspring will be black. To study the effects of gestational ethanol exposure, female a/a C57BL/6J mice were supplied with 10% (v/v) ethanol in their drink bottles for eight days after fertilization by a congenic male carrying the Avy allele (n = 46 litters, 242 total offspring, 109 Avy/a offspring). To evaluate the effects of preconceptional ethanol exposure, female a/ a mice were given 10% (v/v) ethanol for four days per week for ten weeks prior to fertilization (n = 22 litters, 131 total offspring, 69 Avy/a offspring). The Avy allele was passed through the male germ line to avoid the bias associated with maternal transmission, where epigenetic marks can be incompletely cleared between generations [9]. Control mice were given water instead of ethanol (n = 37 litters, 189 total offspring, 91 Avy/a offspring). Maternal ethanol exposure during gestation did not significantly alter Mendelian inheritance of the Avy allele (data not shown) or litter size (control 5.160.4, ethanol exposed 5.260.3, mean6SEM, Student’s t-test, p = 0.9). The establishment of epigenetic marks at Avy occurs during early embryogenesis and is a probabilistic event. The resulting variable expression of Avy among genetically identical mice produces individuals with a predictable range of coat colors. We found that, in the absence of any treatment, 21% of the offspring of Avy/a sires were yellow, 66% were mottled and 13% were pseudoagouti (Figure 1). Gestational ethanol exposure resulted in a higher proportion of pseudoagouti (Pearson’s chi-square test, p,0.05). Twenty-eight percent of offspring were pseudoagouti compared with 13% in the control group (Figure 1). Preconceptional ethanol exposure produced a similar trend (Pearson’s chi-square test, p,0.05). This shows that ethanol exposure can influence the establishment of Avy expression early in development. It increases the probability of transcriptional silencing at this particular locus. To confirm that the coat color correlated with DNA methylation at the Avy allele in gestationally exposed mice, 11 CpG dinucleotides in the LTR cryptic promoter of the Avy IAP were subjected to bisulfite sequencing (Figure 2). The results showed that, as expected, ethanol-exposed yellow mice were hypomethylated compared to ethanol-exposed pseudoagouti mice. Interestingly, atypical hypermethylated clones were found in five out of six yellow mice in the ethanol-exposed group, but they were clearly not sufficient to affect coat color. In the ethanol-exposed group 11% of the CpG dinucleotides were methylated compared to 2% in the control group. Using this measure a Student’s t-test or non-parametric equivalent was unsuitable because the data did not meet the distribution requirements of being spread on a continuum. So we analyzed allele-specific methylation. In the ethanol-exposed group 23% of clones showed evidence of methylation, n = 91, compared with 8% of clones in the control group, n = 71 (Pearson’s chi-square tests, p,0.01). In contrast, total DNA methylation level in ethanol exposed pseudoagouti mice (61%) was not significantly different to that observed in the controls (65%, Student’s t-test, p = 0.27).

duration of ethanol exposure are critical [16,17]. There are a number of mouse models of FAS that have reproduced some of the phenotypic characteristics of the human disorder, particularly the craniofacial abnormalities [16,18,19]. It should be noted that these studies used acute ethanol exposures between gestational days (GDs) 7 and 9 and high concentrations; generally two intraperitoneal injections of 0.015 ml of ,25% (v/v) ethanol per gram of body weight over a 4 hour interval resulting in ataxia and lethargy. These studies only examined the fetal outcomes (GDs 818) of ethanol exposure and did not assay offspring either after birth or as adults. There are some rodent studies of the effects of gestational exposure to moderate amounts of ethanol, but these have only identified neurological and behavioral deficits [20]. The molecular mechanisms underlying FAS are unknown. Some studies have focused on the toxic effects of acetaldehyde, the first metabolite of ethanol [18,21]. Acute ethanol exposure has also been found to result in increased cell death in the developing central nervous system and neurological anomalies in rodents and other animal models [22,23]. The idea that epigenetic changes are involved has been raised but evidence in support of this hypothesis has, so far, been weak. Garro and colleagues [24] detected a small decrease in the level of global methylation of fetal DNA after acute ethanol administration from GDs 9-11. Bielawski et al. [25] reported decreased DNA methyltransferase 1 (Dnmt1) messenger RNA levels in rat sperm after nine weeks of paternal ethanol exposure. Haycock and Ramsey [26] studied imprinting of the H19/Igf2 in preimplantation mouse embryos after maternal ethanol exposure. Despite severe growth retardation of embryos, they did not find epigenetic changes at the H19 imprinting control region. Here we have developed a mouse model of chronic ethanol exposure (overt signs of intoxication are not observed) that produces measurable phenotypes in adults. We find that maternal ethanol consumption either before or after fertilization affects the expression of an epigenetically sensitive allele, Avy, in her offspring and that, at least in the latter case, can also impact postnatal body weight and skull size and shape in a manner consistent with FASD. Our work raises the possibility of a role for epigenetic PLoS Genetics | www.plosgenetics.org

2

January 2010 | Volume 6 | Issue 1 | e1000811


Epigenetics and Gestational Alcohol Exposure

Figure 1. Gestational and preconceptional ethanol exposure produced a higher proportion of pseudoagouti Avy mice. In each pedigree, the black square represents the a/a dam and the diamonds represent her Avy/a offspring. Avy/a sires and all a/a offspring have been excluded from the pedigrees. White diamonds represent yellow offspring, light gray diamonds represent mottled offspring and dark gray diamonds represent pseudoagouti offspring. The percentage of offspring in each coat color category is indicated. The gestational ethanol exposure group and the preconceptional ethanol exposure group were statistically significantly different to the control group using Pearson’s chi-square test (p,0.05). doi:10.1371/journal.pgen.1000811.g001

addition to inter-individual variation, some of the genes were consistently differentially expressed in the ethanol-exposed group (Table S1). Twelve genes were significantly down-regulated (p,0.05) in the ethanol-exposed mice. Three of these; LIM domain and actin binding 1 (Lima1), also known as Eplin, Suppressor of cytokine signaling 2 (Socs2) and CDK5 and Abl enzyme substrate 1(Cables1) have been associated with growth [29–32]. Three; Socs2, Very low density lipoprotein receptor (Vldlr) and Cables1 have been associated with development of the nervous system [33–37] and one, Hepcidin antimicrobial peptide (Hamp1), has been reported to be downregulated in the livers of alcohol-fed rats [38]. We next focused on identifying the characteristic features of FAS in a/a pups exposed to moderate levels of alcohol in utero. All pups (from first litters) were weighed at three weeks of age. It was particularly important not to study Avy mice in these experiments because of the effects on body weight due to Agouti expression. Because litter size is known to influence body weight at weaning, we initially restricted our analysis to litters of 4–5 pups. The gestational ethanol exposure group consisted of 22 offspring, while the control group consisted of 26 offspring. The results (Figure 3) show that the mean weight of offspring of dams that consumed ethanol were significantly lower than that of controls (Student’s ttest, p,0.05). A second analysis included litter size as a random effect. Analysis of Variance of weight at 3 weeks, after adjustment for litter size, confirmed that the mean weight of the ethanol exposed group (n = 73) was statistically significantly smaller than the mean weight of the controls (n = 44, p,0.001, data not shown). The heads of 28–30 day old a/a mice (seven mice from gestational ethanol exposure group and 10 control mice) were subjected to micro-computed tomography, and three-dimensional computer-reconstructions at 18 mm resolution were made of each skull. Visual inspection of the reconstructions revealed an obviously smaller skull size in the ethanol group compared to controls. In addition, differences in shape in a few, but not all, individuals in the ethanol group were apparent. Most notable was the marked leftward deviation of the midface in one male (Figure 4B) and a significantly reduced interfrontal bone in one female (Figure 4C). To provide more quantitative information on skull shape, the 3D co-ordinates of thirty-four landmarks were recorded for each skull and used in various mathematical-based shape and form analyses.

Equivalent results were obtained from a random effects model which allowed for the clustering of clones within mice (p = 0.23). Bisulfite sequencing was also carried out on control and ethanolexposed mottled mice and we found the results extremely variable from one mouse to the next within both groups (Figure S1). Presumably, this is the result of the small size of the tissue sample (tail tip). The variegated expression in mottled mice means that any one sample could represent only one clonal patch, which could harbour an active or an inactive Avy allele and not represent the true methylation state of the whole animal. For this reason mottled mice were not used in our analyses. The effects of Avy expression are pleiotropic. For example, yellow mice exhibit hyperphagia, hyperglycemia, non-insulindependent diabetes and adult onset obesity [27]. We did not assay these other phenotypes following ethanol exposure in our mice as their relevance to humans is questionable since no human ortholog of the Avy allele has been identified. So, while Avy was initially useful as a sensitive indicator of epigenetic changes, any further study of FAS-like phenotypes must necessarily focus elsewhere in the genome. For this reason and the fact that variable Agouti expression would confound many phenotypes, all subsequent analyses were performed on the congenic a/a siblings of the Avy mice. In the mouse, IAPs are present at approximately 1,000 copies per haploid genome [28]. To see if gestational ethanol exposure changed the methylation level of IAPs globally, we performed bisulfite sequencing using PCR primers that anneal to all IAP LTRs and analyzed ten CpG sites in both the tail and forebrains of a/a mice. Tail DNA from eight ethanol-exposed mice (66 clones total) and eight control mice (65 clones total) were compared, and forebrain DNA from five ethanol-exposed mice (33 clones total) and five control mice (43 clones total) were compared using the Student’s t-test. All samples were highly methylated and no differences between the ethanol exposure group and controls were detected (Figure S2). This suggests that only a subset of IAPs, perhaps those that are usually hypomethylated, are sensitive to ethanol exposure. To detect changes in gene expression genome-wide, we performed expression arrays with liver tissue. The benefit of using liver is its homogeneity; it consists mainly of hepatocytes and consequently subtle changes will be detectable. We compared gene expression between age-matched male mice from the gestational ethanol group (three samples) and controls (four samples). In PLoS Genetics | www.plosgenetics.org

3

January 2010 | Volume 6 | Issue 1 | e1000811


Epigenetics and Gestational Alcohol Exposure

Figure 2. Avy methylation in control offspring and offspring exposed to ethanol in utero. Only mice with 100% yellow and 100% pseudoagouti coats were assayed. Control yellow (CY) offspring are numbered 1–6, control pseudoagouti (CP) offspring are numbered 1–6, ethanol yellow (EY) offspring are numbered 1–6 and ethanol pseudoagouti (EP) offspring are numbered 1–6. Methylation was analyzed by sequencing individual clones of PCR-amplified, bisulfite-converted tail genomic DNA. Each circle represents an individual CpG. Open circles indicate an unmethylated CpG, and closed circles represent a methylated CpG. Each line represents an individual clone and the methylation pattern of one allele in one cell. Each block of lines comprises clones derived from one bisulfite conversion. The total percentage of methylated CpGs is shown above each individual and group. There are more hypermethylated clones in yellow ethanol exposed mice (Pearson’s chi-square tests. p,0.01). There is no effect of ethanol exposure on the methylation of Avy in pseudoagouti mice (Student’s t-test, p = 0.27). doi:10.1371/journal.pgen.1000811.g002

Variates Analyses to the output of the GPA. EDMA, in contrast, uses a coordinate-free (or invariant) approach in which all the landmarks are converted into a matrix of inter-landmark distances [44,45]. We used EDMA to find the landmark pairs that show the most difference between two groups. Analysis of skull centroid sizes confirmed the observations of statistically significantly reduced cranial size in the ethanol group, even when the smaller body weight is taken into account (ANOVA, p,0.05; Figure 4D). Although the severe leftward deviation of the one male skull is biologically highly relevant, we

There are two classic approaches in geometric morphometrics: superimposition based methods such as Generalized Procrustes Analysis (GPA) [39–42] or invariant analyses of shape, such as Euclidean Distance Matrix Analysis (EDMA) [43,44]. GPA involves translation, rotation and scaling of landmark data through an iterative process during which the distances between the shapes are minimized by applying least-squares criteria. We used GPA to test for the mean shape difference between the groups and to quantify and visualize localized differences in the cranial shape. We also applied the multivariate ordination method Canonical PLoS Genetics | www.plosgenetics.org

4

January 2010 | Volume 6 | Issue 1 | e1000811


Epigenetics and Gestational Alcohol Exposure

seen in individuals presenting the milder end of fetal alcohol spectrum disorder, and support the notion that this ethanol regime provides a useful and relevant model for the effects of ethanol intake in humans.

Discussion The Avy allele has been called an epigenetic biosensor for environmental effects on the fetus [46]. Previous studies with this allele have identified a number of nutritional factors or toxic agents that affect expression and epigenetic regulation of Avy in offspring exposed in utero. For example, the addition of methyl supplements or an isoflavone (called genistein) to the diet causes hypermethylation of Avy and a shift to a pseudoagouti coat color [3,4,47], whereas bisphenol A, a chemical used in the manufacture of polycarbonate plastics causes a shift towards hypomethylation and a yellow coat [48]. We have used the Avy allele to investigate the epigenetic effects of exposure to ethanol, and established two models of moderate exposure in the mouse. The first involves exposure during the first eight days after fertilization; a period that encompasses pre-implantation, implantation and the first two days of gastrulation. This model simulates the effects of ethanol exposure during the first trimester of pregnancy in humans. Based on previous studies [49] we estimated that the peak blood alcohol level in our model is approximately 0.12%, which is a realistic human exposure. A World Health Organization (WHO) report shows that the maximum legal blood alcohol level for driving in Organization for Economic Cooperation and Development (OECD) countries varies from 0.02–0.08% [50]. The second model involves ethanol consumption for ten weeks immediately prior to fertilization; a period that encompasses multiple cycles of oocyte maturation and ovulation. In mammals, oocyte maturation is characterized by the resumption of meiosis, extrusion of the first polar body and the accumulation of RNAs and proteins in the cytoplasm in preparation for fertilization. Studies of epigenetic reprogramming have shown that following global DNA demethylation, the period of genome-wide remethylation coincides with implantation (for embryos) and oocyte growth (for female germ cells) [51]. Our results demonstrate that gestational ethanol exposure increases the likelihood of transcriptional silencing at Avy, resulting in an agouti-colored coat. It is worth emphasizing that despite being genetically identical, not all Avy mice become pseudoagouti; rather there is a subtle ,15% increase in the proportion of pseudoagouti offspring. Previous studies have demonstrated that there is a tight correlation between DNA methylation at Avy and coat color [9]. As expected, bisulfite sequencing showed that the observed coat colors correlated with DNA methylation status in all cases. We did observe atypical hypermethylated clones in five of six yellow mice from the ethanol exposed group that, while not sufficient to change coat color, may reflect a tendency towards increased DNA methylation in this group. Preconceptional ethanol exposure produced a similar shift towards pseudoagouti in Avy offspring. It is likely that the two types of ethanol exposure have different modes of action on Avy because this allele is paternally-derived and not present in unfertilized oocytes. Consequently, the effects of preconceptional ethanol exposure on Avy expression will be indirect, and further work will be required to understand this mechanism. It is of interest that the coat color changes observed in Avy mice exposed to a methyl rich diet can be inherited across generations [52]. It is therefore possible that the altered coat color following alcohol exposure could also be transmitted to the next generation, but was beyond the scope of this study.

Figure 3. Offspring in the gestational ethanol exposure group have a statistically significantly lower mean weight than the control group (Student’s t-test, p,0.05). All were a/a pups from first litters. Pups were weighed at 3 weeks of age and came from litter sizes of 4 and 5. The graph shows mean6SEM. doi:10.1371/journal.pgen.1000811.g003

chose to exclude this sample from subsequent shape analyses because of its significant impact on the results. This permitted us to assess the significance of other more subtle changes. However, it was included in the univariate analyses of relative cranial dimensions. In the absence of this outlier, CVA still revealed greater variation in overall craniofacial shape within the ethanol group (Figure 5A). Canonical variate 1 (CV1) clearly separated the females in the ethanol group from other skulls, suggesting a more pronounced effect on female skull shape. Notably, all the females as well as the included males from the ethanol group had positive values for CV2, whereas the controls spanned both negative and positive values (see Figure 5A), indicative of a similar trend in shape alteration in response to this level of gestational exposure to ethanol. One female from the ethanol group appeared to be unaffected in terms of craniofacial shape and grouped with the control females in all analyses. We then used EDMA to assess the differences in form between the ethanol and control groups. Analysis of the 561 possible interlandmark measurement combinations assessed by the 34 assigned landmarks (i.e. 34(34-1)/2) demonstrate that the majority show a consistent ratio below one, indicative of the fact that they are changed only relative to skull size and do not reflect localized altered shape. Nevertheless, numerous inter-landmark measures were shown to be significantly different from this mean form. The twenty most significant differences (a = 0.1) in either direction from the mean form are shown in Figure 5B. Strikingly, almost all of these forty most significant differences pertain to midfacial and palatal inter-landmark measures, highlighting the sensitivity of this region to the ethanol. In particular, these data reveal that the ethanol group as a whole have a relatively wider inter-orbital distance (inter-landmark measure 7–11), yet relatively shorter midface than controls (reflected in multiple inter-landmark measures). This is consistent with the CVA findings. A univariate analysis of inter-landmark distances (normalized to centroid size) also supported these differences between the ethanol and control groups, in particular, confirming the greater relative cranial and inter-orbital width in both males and females compared to the sex-matched controls (Figure 5C). Females from the ethanol group also showed greater variation in ‘nare’ height (data not shown), while males from the ethanol group showed reduced rostrum length (Figure 5C). Although less severe than the changes found with acute ethanol exposure in the mouse [16], many of these differences are reminiscent of the facial changes PLoS Genetics | www.plosgenetics.org

5

January 2010 | Volume 6 | Issue 1 | e1000811


Epigenetics and Gestational Alcohol Exposure

Figure 4. Variable midfacial dysmorphism and microcephaly in a/a offspring of mothers that consumed ethanol during gestation. The top panel shows 3D reconstructions of skull microCT data from 28–30 day old mice. (A) Control female. (B) Ethanol-exposed male showing marked leftward deviation of the midface. (C) Ethanol-exposed female showing an almost absent interfrontal bone that is normally characteristic of untreated C57BL6/J mice. (D) Graph of centroid size (used as an estimate of overall cranial size) against body weight for control (n = 10) and ethanol exposed (n = 7) mice at 28–30 days old. Centroid size was determined by summing the distances from each landmark to the centroid for each individual. Centroid size is highly correlated with body weight in both treatment groups, but those in the ethanol group are on average smaller than those in the controls (ANOVA, p,0.05). doi:10.1371/journal.pgen.1000811.g004

maternal behavior, we would argue that it is ultimately a product of exposure to ethanol. Interestingly the effects on skull shape in these mice, like the coat color presentation in Avy mice, are variable despite the fact that the mice are isogenic. Marked variability in phenotype has also been recognized in humans in which not all children of heavily drinking mothers have the typical FAS facial phenotype; the others falling in the continuum of FASDs [14]. While this variability has been attributed to genetic differences, and differences in the level/timing of exposure, it may also be a consequence of stochastic establishment of epigenetic state. The mechanism by which ethanol alters the establishment of epigenetic state at Avy is not known. It has been shown that chronic

Our model of moderate gestational ethanol exposure produces a postnatal growth restriction phenotype and craniofacial dysmorphism in line with those seen with FASD in humans. It is possible that the postnatal growth restriction phenotype is an indirect effect; for example, the offspring may be smaller because of deficient maternal care between birth and weaning. It is unlikely that the dam would have been intoxicated or even experiencing ethanol withdrawal symptoms in the postpartum period since ethanol exposure ceased at GD 8.5 and water was consumed for the rest of gestation (,11.5 days) and throughout nursing (21 days). Regardless of whether the phenotype is a direct physiological consequence of exposure in utero or the indirect result of altered PLoS Genetics | www.plosgenetics.org

6

January 2010 | Volume 6 | Issue 1 | e1000811


Epigenetics and Gestational Alcohol Exposure

Figure 5. Quantitative analysis of the effects of gestational exposure to ethanol on skull shape. (A) Output of the CVA on the procrustes coordinates. Two components account for more than 90% of the total variation. CV1 clearly discriminates ethanol treated female specimens from the rest of the samples. Both male and female ethanol specimens have CV2 values close to (and above) 0, whereas controls span the entire range. Changes in the skull morphology were visualized by using the loadings of the first two canonical variates. Negative values of CV1 indicate a relatively wider skull and relatively longer rostrum. Positive values of CV2 indicates relatively wider orbits (as measured by landmarks 7–11), and shorter rostrum. (B) Output of the EDMA form difference matrix. Out of the possible 561 interlandmark distances, the top and bottom 20 are shown. Values are reported as the proportion of ethanols to controls for each interlandmark distance. None of the confidence intervals for the reported distances crosses 1.0, indicating that all the dimensions are significantly lower in the ethanol specimens. The higher values at the bottom of the graph indicate relatively expanded dimensions for ethanols. 90% confidence intervals were estimated by the non-parametric bootstrap method. For landmark descriptions and positions, see Text S1 and Figure S3. (C) Mean relative size of selected linear dimensions in ethanol and control groups. Tails indicate the observed range. Measurements are divided by the centroid size to remove the size effect. Cranial width is measured as the distance between landmarks 15 and 17; rostrum length is measured as the distance between landmark 9 and center of landmarks 1 and 2; orbital width is measured by the distance between landmark 7 and 11. Abbreviations: FC, female control; MC, male control; FE, female ethanol; ME, male ethanol. Goodall’s F test was used to test for statistical significance of mean shape differences among groups. doi:10.1371/journal.pgen.1000811.g005

ethanol consumption can alter DNA methylation by changing the levels of S-adenosylmethionine (SAM), which donates methyl groups to cytosine [53,54]. It is also known that chronic or acute ethanol consumption can cause post-translational histone modifications in rat tissues [55–59]. The effect of ethanol on the developing embryo has been less studied at the molecular level. Candidate gene and microarray analyses have detected changes in the level of expression (both up- and down-regulation) of numerous genes [60–62] and decreased global DNA methylation in midgestation embryos has been reported following acute ethanol treatment [24]. Recent studies have also reported altered regulation of several microRNAs by ethanol suggesting a possible role for these RNA species in fetal alcohol syndrome [63,64]. Our gestational exposure experiments demonstrate that the epigenome is vulnerable to ethanol during early embryogenesis, a PLoS Genetics | www.plosgenetics.org

time when the DNA synthetic rate is high and there is genome-wide epigenetic reprogramming. Our preconceptional ethanol exposure experiments show that changes in the maturing oocyte (another period in development when there are widespread changes to the epigenome) can also affect offspring phenotype. The identification of microcephaly and midfacial dysmorphism in our gestational exposure model suggests effects on genes other than Avy. Our preliminary genome-wide gene expression analyses of liver in ethanol exposed mice revealed twelve consistently down-regulated and three up-regulated genes. Ongoing work will determine if the expression of these same genes has been changed in other tissues and whether it correlates with alterations in methylation level. The variable and subtle nature of the observed phenotypes will make this work challenging, but our ultimate goal is to gain a better understanding of the molecular processes underlying FASDs. 7

January 2010 | Volume 6 | Issue 1 | e1000811


Epigenetics and Gestational Alcohol Exposure

primary PCR followed by a semi-nested PCR with 2–5 ml of template (primers were forward 59 gaaaagagagtaagaagtaagagagagag 39, reverse 59 aaaatttaacacataccttctaaaaccccc 39 and seminested reverse 59 actccctcttctaaaactacaaaaactc 39) [10]. One bisulfite conversion and PCR was performed for each pseudoagouti sample, while 3–5 independent conversions and 3 PCRs/ conversion were performed for each yellow sample. Global IAP LTR sequences were amplified from bisulfite-converted tail and forebrain DNA using universal IAP primers; forward 59 ttgatagttgtgttttaagtggtaaataaa 39 and reverse 59 aaaacaccacaaaccaaaatcttctac 39 [67]. An agarose-only (no template) control was always included and the experiment was only continued if the agarose control was negative at the end of the semi-nested PCR. PCR fragments were gel-isolated and subcloned into the pGEM-T vector (Promega, Madison, Wisconsin, United States). Individually sequenced clones were analyzed with BiQ Analyzer [68]. To avoid bias, clones from the same PCR were only accepted if they differed by either CpG or non-CpG methylation. Any clones with lower than 90% conversion rate were also excluded from the dataset.

Materials and Methods Ethics Statement All animals were handled in strict accordance with good animal practice as defined by the relevant national and/or local animal welfare bodies, and all animal work was approved by the Animal Ethics Committee of the Queensland Institute of Medical Research (P986, A0606-609M).

Study Design and Animals The mice used in this study were inbred, genetically identical, C57BL/6J and all environmental factors (e.g. cage type, environmental enrichment) were standardized. We chose a voluntary consumption strategy for ethanol exposure instead of intraperitoneal injections or intragastric administration because it produces the least amount of maternal stress. C57BL/6 mice are also known to have a strong drinking preference for 10% (v/v) ethanol over water making them ideal for the study [65,66]. For gestational ethanol exposure, single mottled Avy/a males were caged with single 6–14 week old a/a females. The majority of females were 6–8 weeks old and virgins in both ethanol-exposured and control groups. The females were checked each morning for a vaginal plug which indicated that mating had taken place. The day of plugging was designated GD 0.5, the male was removed from the cage and the water bottle was replaced with one containing 10% (v/v) ethanol. Pregnant females were allowed free access to the drink bottle and food at all times. The ethanol solution was changed and consumption (ml) was measured every 24 hours. The average daily consumption was 3.160.4 ml of 10% (v/v) ethanol (or 12 g ethanol/kg body weight/day) which was not statistically significantly different from the average daily water consumption of control mice (Student’s t-test, two tailed, p = 0.8). It has been shown that in female mice, voluntary consumption of 10% (v/v) ethanol at 14 g ethanol/kg body weight/day produces an average peak blood alcohol level of ,120 mg/dl [49]. Only one out of 47 females tested refused to drink ethanol in the initial 24 hour period and was excluded from the analysis. On the final day of exposure, GD 8.5, the ethanol bottle was replaced with a bottle containing water. All dams were subjected to only one cycle of ethanol exposure. Offspring were left with their mothers until weaning (at 3 weeks of age), when their coat color was recorded (Avy/a mice) or they were weighed or subjected to micro-computed tomography (a/a mice). For preconceptional ethanol exposure, 6 week old a/a female mice were given 10% (v/v) ethanol for 4 days per week (4 days ethanol followed by 3 days water) for ten weeks. After treatment, at 18–22 weeks age, they were mated with mottled Avy/ a males. Avy/a offspring were weaned and phenotyped for coat color at three weeks of age. All ethanol and control exp