Epigenetics in Human Disease
#
Find similar titles
- (rev. 11)
- Hyungyong Kim
Structured data
Table of Contents
Summary #
Chap1. Epigenetics of Human Disease #
Chap2. Methods and Stretiges to Determine Epigenetic Variation in Human Disease #
2.1 Introduction #
Epigenetics is not only one of the most rapidly expanding fields of study in biomedical research but is also one of the most exciting and promising in terms of increasing our understanding of disease etiologies and of developing new treatment strategies.
Among the recent landmark events in this field are the characterization of
- the human DNA methylome at single nucleotide resolution,
- the discovery of CpG island shores,
- the identification of new histone variants and modifications,
- and development of genome-wide maps of nucleosome positions.
Much of our increased understanding is the result of technological breakthroughs that have made it feasible to undertake large-scale epigenomic studies.
These new methodologies have enabled ever finer mapping of the epigenetic marks, such as DNA methylation, histone modifications and nucleosome positioning, that are critical for regulating the expression of both genes and noncoding RNAs [1].
In turn, we have a growing understanding of the consequences of aberrant patterns of epigenetic marks and of mutations in the epigenetic machinery in the etiology of disease.
However, there are several aspects of the methods used to analyze epigenetic variation associated with disease that present potential problems.
First, the tissue used to obtain the DNA. This depends to some extent on the nature of the disease, and can influence the analytical methods that are employed. For example, the DNA of some tissues may have a low incidence of moieties with the diagnostic pattern of methylation, which would limit the choice of analytic methodologies to those with high sensitivity for these molecular signatures.
Second, different diseases may require analysis of either regional or genome-wide epigenetic variation, with the choice depending on the predicted variation in the specific disease. The continuing increase in the number of “epigenetic” diseases means that the list of methods that are practical for the different diseases is also increasing.
Third, epigenetic variation can be a consequence or a cause of the disease. Therefore, use of strategies that can differentiate the role, or otherwise, of epigenetic variation in the causality of a disease is fundamental.
It might, for example, allow determination of whether epigenetic variation is a marker of disease progression, a potential therapeutic target, or a useful marker for assessing the efficiency of a therapy.
Although the new technologies have provided considerable insights into epigenetic aspects of disease, there is still considerably more work that needs to be carried out.
In particular, there is a great need for detailed descriptions of human DNA methylomes and for maps of histone modifications and nucleosome positions in healthy and diseased tissues.
A number of international projects and initiatives have been established to meet this need: the NIH Roadmap Epigenomics Program, the ENCODE Project, the AHEAD Project, and the Epigenomics NCBI browser, among others [2,3].
The availability of detailed epigenetic maps will be of enormous value to basic and applied research and will enable pharmacological research to focus on the most promising epigenetic targets.
This chapter summarizes some of the contemporary methods used to study epigenetics and highlights new methods and strategies that have considerable potential for future epigenetic and epigenomic studies.
2.2 DNA Methylation Analysis #
Methylation of cytosine bases in DNA is not only an important epigenetic modification of the genome but is also crucial to the regulation of many cellular processes.
DNA methylation is important in many eukaryotes for both normal biology and disease etiology [1]. Therefore, identifying which genomic sites DNA are methylated and determining how this epigenetic mark is maintained or lost is vital to our understanding of epigenetics.
In recent years, the technology used for DNA methylation analysis has progressed substantially: previously, analyses were essentially limited to specific loci, but now, they can be performed on a genome- wide scale to characterize the entire "methylome" with single-base-pair resolution [4].
The new wealth of profiling techniques raises the challenge of which is the most appropriate to select for a given experimental purpose.
Here, we list different methodologies available for analyzing DNA methylation and briefly compare their relative strengths and limitations [5]. We also discuss important considerations for data analysis.
2.2.1 Methylation-Sensitive Restriction Enzymes #
The identification of DNA methylation sites using methylation-sensitive restriction enzymes requires high-molecular-weight DNA and is limited by the target sequence of the chosen enzyme.
The use of restriction enzymes that are sensitive to CpG methylation within their cleavage recognition sites [6] is a relatively low-resolution method, but it can be useful when combined with genomic microarrays [7,8].
2.2.2 Bisulfite Conversion of Unmethylated Cytosines, PCR and Sequencing #
Conversion of unmethylated sequences with bisulfite followed by PCR amplification and sequencing analyses provides an unbiased and sensitive alternative to the use of restriction enzymes.
This approach is therefore generally regarded as the “gold-standard technology” for detection of 5-methyl cytosine as it enables mapping of methylated sites at single-base-pair resolution [9].
The bisulfite method requires a prolonged incubation of the DNA sample with sodium bisulfite; during this period, unmethylated cytosines in the single-stranded DNA are deaminated to uracil.
However, the modified nucleoside 5-methyl cytosine is immune to transformation and, therefore, any cytosines that remain following bisulfite treatment must have been methylated.
This method is currently one of the most popular approaches to methylation analysis and yields reliable, high-quality data [9,10].
The drawback to the method is that it is labor-intensive and is not suitable for screening large numbers of samples.
2.2.3 Comparative Genomics Hybridization (CGH) and Microarray Analysis #
A combination of CGH and microarray analysis can overcome the limitations of the bisulfite method. This combination can enable high-throughput methylation analyses.
The various advantages and disadvantages of this approach have been reviewed previously [11-13]. Recent high-throughput studies have used protein affinity to enrich for methylated sequences and then exploited these sequences as probes in genomic microarrays.
Methylated DNA fragments can be affinity-purified either with an anti-5-methyl cytosine antibody or by using the DNA-binding domain of a methyl-CpG-binding protein [14,15].
2.2.4 Bisulfite Treatment and PCR Single-Strand conformation Polymorphism (SSCP) (BiPS) #
The combination of bisulfite treatment with PCR-based Single-strand DNA conformation polymorphism (SSCP) analysis offers a potentially quantitative assay for methylation [16].
This combination approach, sometimes referred to as BiPS analysis, can be used for the rapid identification of the methylation status of multiple samples, for the quantification of methylation differences, and for the detection of methylation heterogeneity in amplified DNA fragments.
This technique has been successfully used to investigate the methylation status of the promoter region of the hMLH1, p16, and HIC1 genes in several cancer cell lines and colorectal cancer tissues [17].
2.2.5 Methylation-Sensitive Single-Nucleotide Primer Extension #
Methylation-sensitive single-nucleotide primer extension (MS-SNuPE) is a technique that can be used for rapid quantitation of methylation at individual CpG sites [18,19].
Treatment of genomic DNA with sodium bisulfite is used to convert unmethylated cytosine to uracil while leaving 5-methylcytosine unaltered.
Strand-specific PCR is performed to generate a DNA template for quantitative methylation analysis using MS-SNuPE. This protocol can be carried using multiplex reactions, thus enabling the simultaneous quantification of multiple CpG sites in each assay.
2.2.6 Combined Bisulfite and Restriction Analysis #
The Combined bisulfite and restriction analysis (COBRA) approach involves combining the bisulfite and restriction analysis protocols [20].
It is relatively simple to use while still retaining quantitative accuracy. Although both COBRA and MS-SNuPE are quantitative, they have the restrictions that the former can only analyze a specific sequence because it utilizes restriction enzymes and the latter is somewhat laborious.
MS-SnuPE has also been combined with microarray analysis to allow parallel detection of DNA methylation in cancer cells [19].
2.2.7 Quantitative Bisulfite Sequencing using Pyrosequencing Technology #
Quantitaive bisulfite sequencing using pyrosequencing technology (QBSUPT) is based on the luminometric detection of pyrophosphate release following nucleotide incorporation [21].
The advantage of QBSUPT is that quantitative DNA methylation data are obtained directly from PCR products, without the need for cloning and sequencing a large number of clones.
However, QBSUPT cannot be used to analyze haplotype-specific DNA methylation patterns. Thus, while very sensitive, this assay may be more suited to laboratory diagnosis.
2.2.8 MethyLight Technology #
MethyLight technology provides a tool for the quantitative analysis of methylated DNA sequences via fluorescence detection in PCR-amplified samples [22].
This method has two particular advantages:
- first, the fluorescent probe can be designed to detect specific DNA methylation patterns, not simply to discriminate methylated from unmethylated sequences;
- second, it has the potential ability to rapidly screen hundreds or even thousands of samples.
2.2.9 Quantitative Analysis of Methylated Alleles (QAMA) #
QAMA is a quantitative variation of MethyLight that uses TaqMan probes based on minor groove binder (MGB) technology [23].
QAMA has the main advantage of being simple to set up, making it suitable for high-throughput methylation analyses.
2.2.10 DNA Methylation Analysis by Pyrosequencing #
Pyrosequencing is a replication-based sequencing method in which addition of the correct nucleotide to immobilized template DNA is signaled by a photometrically detectable reaction.
This method has been adapted to quantify methylation of CpG sites. The template DNA is treated with bisulfite and PCR is used for sequencing; the ratio of T and C residues is then used to quantify methylation.
Pyrosequencing offers a high-resolution and quantitatively accurate measurement of methylation of closely positioned CpGs [24].
2.2.11 Matrix-Assisted Laser Desorption Ionization Time-of-Flight Mass #
Tost et al. [25] described a method using Matrix-assisted laser desorption ionization time-of-flight (MALDI-TOF) for analysis and quantification of methylation at CpGs.
Although the method requires gene-specific amplification, and should therefore be considered a candidate gene method, it is amenable to automation as it can make use of the EpiTYPER platform developed by Sequenom.
EpiTYPER can be used to determine methylation status following gene-specific amplification of bisulfite-treated DNA followed by in vitro transcription, base-specific RNA cleavage and MALDI-TOF analysis [26].
Although it is not a genome-wide technology, it is quantitative for multiple CpG dinucleotides for large numbers of gene loci and can be reliably applied to pooled DNA samples to obtain group averages for valuable samples.
2.2.12 New Technologies #
Several second-generation sequencing platforms became available in 2007 and were further developed with the launch of the first single-molecule DNA sequencer (Helicos Biosciences) in 2008 [27].
These new sequencing tools have been applied to epigenetic research, for example, studies on DNA methylation.
Undoubtedly, future developments of these technologies hold the tantalizing prospect of high-throughput sequencing to identify DNA methylation patterns across the whole mammalian genome, possibly even opening up the prospect of genotyping individual cancers to aid the application of custom-designed cancer therapies [28].
2.2.13 Combinational Tools #
The development of computational tools and resources for DNA methylation analysis is accelerating rapidly [29].
Sequence-based analyses involve alignment to a reference genome, collapsing of clonal reads, read counts or bisulfite-based analysis [30], and further data analysis.
Comparison of the relative strengths and weaknesses of the various methods for DNA methylation analysis is hampered by their complexity and diversity.
Inevitably, choice of method is based on pragmatic grounds, for example, the number of samples, the quality and quantity of DNA samples, the desired coverage of the genome, and the required resolution.
2.3 Histone Modification Analysis #
Histones are abundant, small basic proteins that associate with the DNA in the eukaryotic nucleus to form Chromatin.
The four core histones (H2A, H2B, H3 and H4) can show substantial modifications of 20-40 N-terminal amino acids that are highly conserved despite playing no structural role.
The modifications are thought to constitute a Histone code by which the cell encodes various chromatin conformations and controls Gene expression states.
The analysis of these modified histones can be used as a model for the dissection of complex epigenetic modification patterns and for investigation of their molecular functions.
In this section, we review the techniques that have been used to decipher these complex histone modification patterns.
Post-translational modification (PTM) of proteins plays a key role in regulating the biological function of many polypeptides.
Initially, analyses of the modification status were performed using either a specialized gel system or a radioactive precursor molecule followed by complete protein hydrolysis and identification of the labeled amino acid [31-35].
This approach showed that histones could be modified in vivo by acetylation, methylation or phosphorylation [31,36,37].
As most of the modifications occurred at the N-terminus of the histone, it was feasible to map the site of some modifications using Edman degradation [38].
However, this is only possible when histones can be purified in sufficient quantities and with a high purity.
The purification process is labor-intensive and involves multiple steps; this precludes(불가능하게 하다) the possibility of analyzing histone modifications from small numbers of cells or of mapping post- translational modifications at specific loci.
Mass spectrometry is the method of choice for analyzing PTM in histones [39-42], as each modification adds a defined mass to the molecule.
The high resolution of modern mass spectrometers and recent developments in soft ionization techniques have facilitated the mapping of posttranslational modifications.
As a result, these high-resolution methods have enabled much faster detection of PTMs and have shown that such modifications are considerably more abundant than expected.
The increased complexity of the proteome revealed by these analyses presents major challenges both for investigation and for the processing of the raw data.
The mass spectrometry methods currently used to precisely map a modified residue are very elaborate and require enrichment of the peptides that carry particular modifications [43-46].
Different molecules can carry several modifications that localize on a single peptide within a protease digest [47-50].
These short stretches of dense modifications have been termed Eukaryotic linear motifs (ELMs) and are thought to play a critical role in regulating the global function of proteins [51].
The high level of sequence conservation within these short ELMs also supports this idea.
Many ELMs contain a number of amino acids that can be modified and the position of each modification has to be precisely determined [51].
Identification of each modification at different sites within a highly modified ELM is laborious and also hampered by the fact that some modifications result in similar mass differences.
A variety of different methods are available to study complex histone modification patterns; these range from “bottom-up approaches” to produce detailed and quantitative measurements of particular histone modifications, to “top-down approaches” aimed at elucidating the interactions of different modifications [52].
The use of a range of methods should greatly facilitate analysis of complex modification patterns and provide a greater insight into the biological roles of these histone modifications.
Many of the methods used to analyze histone modifications can equally be applied to other types of modified protein that can function as integrators in multiple signaling pathways.
The information on epigenomic analyses, including histone modifications using new technology such as next-generation sequencing (NGS), is reviewed below.
2.4 Non-Coding RNA Analysis: MicroRNA #
There is increasing evidence that small non-coding RNAs, such as microRNA, and Long non-coding RNAs, such as lincRNA, can regulate Gene expression.
Mature microRNAs (miRNAs) are very small molecules, 19-25 nucleotides (nt), which poses a problem for their quantification.
As small RNAs are less efficiently precipitated in ethanol, it is necessary to avoid resuspension in ethanol when using the standard Trizol protocol for RNA isolation.
On the other hand, miRNAs appear to be more stable than longer RNAs and, consequently, in degraded samples it is still possible to obtain readable miRNA expression data.
miRNAs have been reported to have greater stability than mRNAs in samples obtained from tissues which were fixed with formalin and paraffin embedded [53-55].
However, the intrinsic characteristics of miRNAs make production of miRNA expression profiles very problematic.
For example, mature miRNAs lack common sequence features, such as a poly-A tail or 50 cap, that can be used to drive selective purification.
As mentioned above, the mature miRNAs are very small, which reduces the effectiveness of most conventional biological amplification methods.
This problem arises because of poor specificity in primer binding. As a consequence, standard real-time PCR methods can only be applied to miRNA precursors.
Furthermore, sequence heterogeneity among the miRNAs with respect to GC content, results in a wide range of optimal melting temperatures for these nucleic acid duplexes and hampers the simultaneous detection of multiple miRNAs.
An additional problem for the specificity of miRNA detection arises from the close sequence similarity of miRNAs of the same family (mature miRNA, pri-miRNA, and pre-miRNA) and of the target sequence.
Currently, various methodologies have been adapted to detect miRNAs, including
- Northern blot analysis with radiolabeled probes [56,57],
- microarray-based [58] and PCR-based analyses [59],
- single molecule detection in a liquid phase, in situ hybridization [60,61]
- and high-throughput sequencing [62].
However, all of these methods have inherent limitations and the choice of method for miRNA detection depends mainly on specific experimental conditions.
Ideally, an miRNA profiling method should fulfill the following requirements: sufficient sensitivity to allow quantitative analysis of miRNA levels, even with small amounts of starting material; sensitive to single-nucleotide differences between miRNAs; highly reproducible; capable of processing many samples at one time; and, easy to perform without the need for expensive reagents or equipment [63].
miRNAs were first identified using Northern blotting [64-66]. Small RNA molecules can be detected with a modified version of the standard protocol for Northern blotting in which high-percentage urea-acrylamide gels are mainly used; this modified approach can detect small RNA molecules that are approximately 100 times smaller than the average coding RNA.
There are three main techniques for detecting and quantifying miRNA in tissue samples:
- cloning of miRNA;
- PCR-based detection;
- and, hybridization with selective probes.
Initially, cloning was the main approach used as it offers advantages for discovery of new miRNAs not predicted from bioinformatic analysis and for sequencing the miRNAs [65,67,68].
However, cloning is less precise than the other methods for quantifying miRNAs. The PCR-based technique is able to detect low copy numbers with high sensitivity and specificity of both the precursor and mature form of miRNAs [69].
It is relatively inexpensive, can be used for clinical samples, and can work with minute amounts of RNA.
Various hybridization techniques can be used on miRNAs, namely, Northern blotting, bead-based flow cytometry, in situ hybridization and microarray [70,71].
Northern blotting using radioactive probes is very sensitive; however, it is very time-consuming, is only practical in large clinical studies for detecting expression of hundreds of miRNAs, and requires large amounts of total RNA from each sample.
Following their initial discovery, the number of miRNAs quickly increased and they were shown to be present in all eukaryotic species [66,67,72].
In order to analyze a large number of miRNAs in many patients, it is essential to have a technique that can simultaneously process multiple miRNAs using the relatively small amounts of RNA that can be obtained from each patient.
Designing probes for miRNAs is complicated by their short length and their low abundance. As each miRNA is only 19-25 nt long, the probe is almost exclusively determined by the sequence of the miRNA itself, which necessitates a different annealing temperature for each probe and miRNA interaction.
Microarray technology was developed in 1995 and has been applied to miRNA quantification [71,73].
In brief, microarrays are based on multiple hybridizations in parallel, using a glass or quartz support where probes have either been spotted or synthesized by photochemical synthesis [74-76].
The ability to include a high density of spots on an array enables a high number of genes to be analyzed simultaneously [76,77].
Three approaches are in general use for detecting nucleic acids such as DNA or RNA on an array platform.
The first, which is common for custom arrays, uses glass slides and is based on the spotting of unmodified oligonucleotides over the slide [78].
The second also uses glass slides and is based on the deposition of probes on the slide. The distinction is that the 50 terminus of the probe is cross-linked to the matrix on the glass. This allows the spotting of a much higher number of probes on these slides.
In the third method, probes are photochemically synthesized directly on a quartz surface, allowing the number of probes to rise to millions on a small and compact area [75].
Usually, but not always, the first two methods compare two samples on each slide (one used as reference) that are stained in different colors.
The third method uses single-color hybridization where each slide is hybridized with only one sample.
Most microarrays use DNA oligo spotting, a few use locked nucleic acid (LNA) that may enable increased affinity between probes and miRNAs, thereby achieving more uniform conditions of hybridization with different probes.
Ideally, microarray-based detection of miRNAs should avoid manipulation of the samples, such as enrichment of low-molecular-weight RNA species and amplification of miRNAs.
Additionally, it is feasible to develop microarrays able to discriminate the two predominant forms of miRNAs (precursor and mature).
The International Human Epigenome Consortium (IHEC) recommended that the identity and abundance of all non-coding RNA species in a cell type should be determined and suggested that this should be accomplished by RNA-seq by next-generation DNA sequencing after isolation of large or small RNA species.
2.5 Analysis of Genome DNA Replication Program Based on DNA Replication Timing #
Chromosomal DNA replication is essential for normal cellular division and also has a significant role in the maintenance of genomic integrity.
Genomic instability increases when DNA replication errors occur and, thus, mistakes in replication may be an important factor in the etiology of cancers and neuronal disorders.
Replication in eukaryotes is initiated from discrete genomic regions, termed origins. The replication program is strict within a cell or tissue type but can vary among tissues and during development.
The genetic program that controls activation of replication origins in mammalian cells awaits elucidation.
Nevertheless, there is evidence that the specification of replication sites and the timing of replication are responsive to epigenetic modifications.
Over the last decade, many new techniques have been developed and applied to analysis of DNA replication timing in the human genome.
These techniques have provided significant insights into Cell cycle controls, human chromosome structure, and the role of epigenetic changes to the genome with respect to DNA replication.
In this section, we describe the methods that are currently employed for determining the spatiotemporal(시공간의) regulation of DNA replication in the human genome (DNA replication timing).
Two approaches are generally used to investigate DNA Replication timing, Fluorescence in situ hybridization (FISH) or PCR [79-82].
The FISH method is based on the cytogenetic discrimination of replicated (two double signals for autosomal loci; DD) and unreplicated loci (two single signals; SS) using DNA probes that are labeled with a fluorescent dye [79].
By comparing the frequencies of the two types of signal, the relative replication timing of each locus can be determined.
However, the method is absolutely dependent on the assumption that replicated loci will provide DD-type FISH signals, that is, the replicated signals created by passage of the replication fork will separate sufficiently to be seen as a DD signal.
The PCR-based method involves labeling cells in exponential growth with BrdU for 60-90 min and then fractionating them by flow cytometry.
Typically, this allows discrimination of six cell cycle fractions: G1, four successive S phase stages, and G2/M (mitotic) [80-82].
Samples containing equal numbers of cells from each cell-cycle fraction are collected, and newly replicated DNA labeled with BrdU is extracted and purified from each fraction.
Whether or not a locus has commenced or completed replication can be determined by quantitative PCR of the newly replicated DNA.
This approach has been exploited to provide replication timings for sequence tagged sites on human chromosomes 11q and 21q [81,82] and identified Mb-sized zones that replicated early or late in S phase (i.e. early/late transition zones).
The early zones were found to be more GC-rich and gene-rich than the late zones, and the early/late transitions occurred primarily in genome regions that showed rapid switches in the relative GC content in the chromatin [81,82].
Woodfine et al. [83] performed the first microarray-based analysis to map replication timing in the human genome. They adapted the Comparative genomic hybridization technique, which had been developed to assess genomic copy-number differences in cancer cells.
Relative replication times can be inferred by measuring the relative amounts of different sequences in a population of S-phase cells compared to a non-replicating G1 genome.
In this method, S-phase cells in an asynchronously growing human cell culture are isolated and their DNA extracted.
The DNA is color-labeled and then mixed with DNA from G1 phase cells that has been differentially color-labeled.
The combined DNA sample is hybridized to an array of genomic sequences and, after normalization of the data, the relative fluorescence intensities of the S-phase DNA at each array spot (the S to G1 replication timing ratio) provide a measure of replication timing.
Comparison of the data obtained by this method with those from nascent strand quantitative PCR methods described above [81e84], showed that this new approach provided estimations of replication timing that were consistent with those obtained earlier [82,83].
White et al. [85] modified the experimental approach of Woodfine et al. [83] by comparing the representation of genomic sequences in newly replicated DNA isolated from early S-phase cells with that from late S-phase cells.
In this way, they obtained a replication timing ratio for early S to late S.
Although their measure has a different basis to the S phase to G1 ratio of Woodfine et al. [83], nevertheless, the results provide a similar description of replication timing.
For example, a replication profile for chromosome 22 in a lymphoblastoid cell line obtained by White et al. [85] was consistent with that of Woodfine et al. [83].
To date, high-resolution analyses have shown a positive correlation between replication timing and a range of genomic parameters such as GC content, gene density and transcriptional activity [82,83,85].
DNA replication errors have been implicated in the etiology of many diseases [86e89].
One possible mechanism for this relationship is that disease-related reprogramming of the epigenome might depend on impaired regulation of replication timing patterns [90].
Thus, for example, chromosomal rearrangements in cancers have been reported to be associated with replication timing changes in translocation breakpoints [91,92].
Likewise, peripheral blood cells from prostate cancer patients have an altered pattern of replication accompanied by Aneuploidy that distinguishes them from individuals with benign prostate hyperplasia (a common disorder in elderly men).
These cellular characteristics have been suggested to be a better marker for prostate cancer than use of the blood marker, prostate-specific antigen (PSA) [93,94].
Analyses of changes in replication timing in the human genome have shown that the tumor suppressor gene p53 plays a role in its regulation through the control of cell cycle checkpoints [95].
Thus, in cancer cells, the normal order of DNA replication is altered: regions that normally replicate late sometimes replicate early, and vice versa [84,91-93,96-98].
Replication timing has also been shown to change during development, differentiation and tumorigenesis; moreover, the structure of the chromatin may also change.
The model illustrated in Figure 2.1 shows a possible mode of interaction of chromatin conformation, replication timing and the expression of genes, including oncogenes, in an early/late-switch region of replication timing (R/G-chromosome band boundary) [99].
For example, the replication timing environment of an oncogene (or a tumor suppressor gene) located in an early/late-switch region of replication timing may change from intermediate replication, between early and late S phase, to early replication timing (or late replication timing) by an increase (or decrease) in the number of early replication origins at the edge of an early replication zone (Figure 2.1B).
In addition, the chromatin environment of such an oncogene (or tumor suppressor gene) may also change from that of an R/G-chromosome band boundary to an R band (or from that of an R/G-chromosome band boundary to a G band).
Stalling of the replication fork in the vicinity(근처) of oncogenes might also induce translocation events, thereby altering the structure or the local environment of the oncogenes and affecting their function (Figures 2.1A, 2.1B) [99].
The interrelationship of these various factors suggests that analysis of replication timing assays as part of an epigenetics investigation might, in future, allow much earlier cancer detection than is possible today [5,99,100].
2.6 Strategy for Epigenomic Investigation Based on Chromosomal band Structures #
The various methods for genome-wide epigenetic analyses described in the above sections are summarized in Table 2.1.
The Replication timing of genes along the entire lengths of human chromosomes 11q and 21q has been described previously; these analyses showed that cancer-related genes, including several oncogenes, are concentrated in regions showing transition from early to late replication timing [82,84,96,97].
Scrutiny of the updated replication timing map for human chromosome 11q found that amplicons, gene amplifications associated with cancer, are located in the early/late switch regions of replication timing in human cell lines [84].
These transition regions also contain genes related to neural diseases, such as APP associated with familial Alzheimer’s disease (AD1), and SOD1 associated with familial Amyotrophic lateral sclerosis (ALS1) [82].
Several neural disease genes are present in chromosomal regions with early/late transitions [82,96].
Interestingly, in metaphase and interphase nuclei, early-replicating zones have a looser chromatin structure, whereas late-replication zones have compact chromatin [101-104].
Therefore, transitions in chromatin compaction coincide with replication transition regions.
In terminally differentiated cells, such as neurons, it is expected that the level of chromatin compaction established during the final round of DNA replication will be maintained.
Transitions in chromatin compaction within a gene might lead to reduced genomic stability, and may also increase susceptibility to agents that can influence gene expression.
Thus, the probability of epimutation, such as instability of chromatin structures and DNA damage (including DNA rearrangements) appears to be greater in replication transition regions than elsewhere in the genome [82,84,96,97,105-107] (Figure 2.1A).
It is likely that transition zones are subject to tight regulation, as changing their positions would affect the replication timing patterns of several flanking replicons.
There is strong evidence that transition zones are conserved among different ES cell lines [108].
During development, transition zones may therefore be targets for chromatin-modifying enzymes to facilitate rapid reconfiguration and establishment of new replication timing patterns.
Early and late replication zones tend to be located in different regions of the nucleus during S phase; it is possible that transition regions flanking these replication zones might be subject to dynamic reorganization or relocation during replication fork movement.
The transition zones for replication timing are known to be associated with genomic instability, which is suspected to be involved in the etiology of human diseases such as cancer.
Common fragile sites (CFSs) represent the best-known examples of regions of the human genome that break under replication stress. CFSs are associated with very large genes [109] and are frequently found at R/G band boundaries [110].
The human genome appears to have a large excess of so-called “dormant”(잠자고있는) or “backup” origins and these may be used to rescue stalled replication forks. Interestingly, “spare” origins appear to be absent from R/G band boundaries [111,112].
In conclusion, early/late-switch regions of replication timing generally correspond with transitions in relative GC content, are correlated with R/G chromosome band boundaries, and are suspected of being “unstable” genomic regions that have increased susceptibility to epigenetic mutation, as well as DNA damage (Figure 2.2A) [5,99].
There is a clear need for further epigenomic analysis on chromosomal band structures, in particular, to obtain a greater understanding of these epimutation-sensitive regions at the genome sequence level (Figure 2.2B, Table 2.2).
Before performing epigenomic analysis using DNA methylation and histone modification, we propose clarifying the direct correspondence between chromosomal band (R-, T-, and G-band) and genome sequence by analyzing DNA replication timing and GC %, etc. (Figures 2.2A, 2.2B).
Additionally, we suggest that epigenomic analysis focused on chromosomal band structures (the boundaries of which were identified as epimutation-sensitive genomic regions at the genome sequence level) will provide considerable insights into normal and disease conditions. In the future, this will be a promising strategy for epigenetic analysis.
2.7 Overview of Recent Epigenetic Genome-Wide or Bioinformatic Studies and Strategies #
Over the last few years, Genome-wide association studies (GWASs) have successfully identified loci associated with common diseases; however, the basis of many diseases still remains to be determined.
The development of new genomic technologies has opened up the possibility of performing similar genome-wide studies to GWASs but with the aim of identifying epigenetic variations, particularly with respect to DNA methylation, that are associated with disease.
Although such Epigenome-wide association studies (EWASs) will provide valuable new information, they do pose specific problems that are not inherent in GWASs.
Performance of an EWAS is predicated on the assumption that it will be equally successful as a GWAS for identifying disease-associated variations.
However, the differences between the epigenome and the genome influence the nature of the study design.
For example, tissue-specific epigenetic modifications or epigenetic changes that occur downstream of the disease initiation step, might be important considerations in an EWAS for determining the cohorts and samples that should be analyzed.
Although it is technically feasible to use array- and sequencing-based technologies in an EWAS, the computational and statistical methods required to analyze the data still require further development [113].
Several next-generation sequencing (NGS) platforms harness the power of massively parallel short-read DNA sequencing (MPSS) to analyze genomes with considerable precision. These methods can be applied to genome-wide epigenomic studies and they offer a potentially revolutionary change in nucleic acid analysis.
The ability to sequence complete genomes will undoubtedly change the types of question that can be asked in many disciplines of biology.
Recent excellent reviews provide a comprehensive description of the chemistry and technology behind the leading NGS platforms [114].
In this section, we discuss the application of NGS in epigenomic research, with a particular focus on chromatin immunoprecipitation combined with Mpss (ChIP sequencing or “ChIP-seq”).
ChIP-seq offers the possibility of genome-wide profiling of DNA-binding proteins, histone modifications or nucleosomes. This method has advantages over the longer-established ChIP-chip (chromatin immunoprecipitation combined with microarray) technique [52].
For example, although arrays can be tiled at a high density, they require large numbers of probes and are expensive [115].
The hybridization process also imposes a fundamental limitation in the resolution of the arrays.
ChIP-seq does not suffer from the “noise” generated by the hybridization step in ChIP-chip, which is complex and dependent on many factors, including the GC content, length, concentration and secondary structure of the target and probe sequences.
Cross-hybridization between imperfectly matched sequences can occur frequently and contribute to the noise. In addition, the intensity signal measured on an array might not be linear over its entire range, and its dynamic range is limited below and above saturation points.
A recent study reported that distinct and biologically meaningful peaks seen in ChIP-seq were obscured when the same experiment was conducted with ChIP-chip [116].
Genome coverage using ChIP-seq is not limited by the selection of probe sequences on the array.
This is an important constraint in microarray analysis of repetitive regions of the genome, which are often “masked out” on the arrays.
As a consequence, investigation of heterochromatin or microsatellites is optimized by use of ChIP-seq.
Sequence variations within repeat elements can be identified and used to align the reads in the genome; unique sequences that flank repeats are similarly helpful [117].
The main disadvantages of ChIP-seq are cost and availability. Several groups have successfully developed and applied their own protocols for library construction, which has substantially lowered that part of the cost.
For high-resolution profiling of an entire large genome, ChIP-seq can already be less expensive than ChIP-chip; however, this depends on the genome size and the level of sequencing detail required; a ChIP-chip experiment on selected regions using a customized microarray may yield as much biologically meaningful data.
The recent decrease in sequencing cost per base-pair has not had as large an effect on ChIP-seq as on other applications, since the decrease has come as much from increased read lengths as from the number of sequenced fragments. The gain in the fraction of reads that can be uniquely aligned to the genome declines rapidly after 25-35 bp and is marginal beyond 70-100 nucleotides [118].
However, as the cost of sequencing decreases and institutional support for sequencing platforms grows, ChIP-seq is likely to become the method of choice for nearly all ChIP experiments in the near future.
ChIP-seq analyses have been performed on multiple transcription factors with their transcriptional co-regulators, boundary elements, numerous types of histone modifications, histone variants, nucleosome occupancy, DNA methylation patterns and gene transcription [119].
The data from these analyses are providing fresh insights into complex transcriptional regulatory networks.
Furthermore, “chromatin signatures”, characteristic chromatin structures in particular genomic regions, enable genome annotation based on predicting histone modifications and an overall landscape of the epigenome in human cells [52,119].
In addition, identification of the specific chromatin signatures associated with genomic features such as enhancers, insulators, boundary elements and promoters, will provide another means of annotating complex genomes.
NGS technologies provide an increasing ability to query multiple genomic features, which were previously too technically challenging and costly; this inevitably has raised expectations and ambitions, as exemplified by the published goals of the International Human Epigenome Consortium.
Histone PTMs influence gene expression patterns and genome function by establishing and orchestrating DNA-based biological processes [120].
PTMs can either directly affect the structure of chromatin or can recruit co-factors that recognize histone marks and thereby adjust local chromatin structures and their behavior.
A comprehensive and high-resolution analysis of histone modifications across the human genome will help our understanding of the functional correlation of various PTMs with processes such as transcription, DNA repair and DNA replication [121,122].
Use of modification-specific antibodies in ChIP has revolutionized the ability to ascribe biological functions to histone modifications.
ChIP-on-chip has allowed a description of the global distribution and dynamics of various histone modifications [123].
However, prior to NGS, it had not been practical to map multiple modifications in an unbiased genomic fashion.
One of the first applications of ChIP-seq was in the analysis of the genome-wide distribution of histone modifications [119].
This study, and others that followed, exemplified the newfound feasibility and utility of obtaining collections of comprehensive genomic datasets.
Twenty histone methylation sites in human T-cells were mapped [124], while five histone methylation patterns in pluripotent and lineage-committed mouse cells were described [125].
Such genome-wide analyses have revealed associations between specific modified histones and gene activity as well as the spatial and combinatorial relationship between different types of histone modifications.
Moreover, dynamic changes in histone modification patterns during cellular differentiation and allele-specific histone modifications were revealed [125].
These initial ChIP-seq studies, in combination with more recent analyses examining the distribution of other types of histone modifications, have revealed that specific genomic features are associated with distinct types of chromatin signatures [126,127].
Such genome-wide chromatin landscape maps have subsequently been exploited as a tool for defining and predicting novel transcription units, enhancers, promoters, and most recently ncRNAs in previously unannotated regions of the human genome [128].
In future, the influence and utilization of NGS technologies will undoubtedly find widespread use and relevance in many different areas of biology, far beyond the test-bed of epigenetics.
Recent studies of the epigenome have shown that many promoters and enhancers have distinctive chromatin signatures. These characteristic motifs can be used as to search and map the regulatory elements of the genome.
Won et al. [129] used this approach in a supervised learning method involving a trained Hidden Markov model (HMM) based on histone modification data for known promoters and enhancers. They used the trained HMMs to identify promoter or enhancer-like sequences in the human genome [129].
In a somewhat similar manner, Ernst and Kellis [130] sought to identify biologically meaningful combinations of epigenetic combinations in the genome of human T-cells.
They defined these genomic regions as having “spatially coherent and biologically meaningful chromatin mark combinations”, and applied a multivariate HMM analysis to search for them.
Fifty-one distinct chromatin states were identified by the analysis, including those associated with promoters, transcription, active intergenic regions, large-scale repressed regions and repetitive chromatin.
Each chromatin state showed specific enrichments for particular sequence motifs, suggesting distinct biological roles.
This approach, therefore, provides a means of annotating the human genome with respect to function and describes the locations of regions with diverse classes of epigenetic function across the genome [130].
There is considerable uncertainty regarding the influence of variations in chromatin structure and transcription factor binding on gene expression, and whether such variations underlie or contribute to phenotypic differences.
To address this question, McDaniell et al. [131] cataloged variation in chromatin structure and transcription factor binding between individuals and between homologous chromosomes within individuals (allele-specific variation).
The analysis was carried out on lymphoblastoid cells from individuals with diverse geographical ancestries. They reported that 10% of active chromatin sites were specific to individuals, and a similar proportion was allele-specific.
Both individual-specific and allele-specific sites could be transmitted from parent to child, suggesting that these epigenetic marks are heritable features of the human genome.
The study highlights the potential importance of heritable epigenetic variation for phenotypic variation in humans [131].
Ernst et al. [132] extended their earlier chromatin profiling analysis described above by mapping nine chromatin marks in nine different human cell types with the aim of identifying regulatory elements, their cell-type specificities and their functional interactions.
By comparing chromatin profiles across a range of cell types they were able to define cell-type-specific patterns of promoters and enhancers affecting chromatin status, gene expression, regulatory motif enrichment and regulator expression.
Using the profiles, they linked enhancers to putative target genes and predicted the cell-type-specific activators and repressors with which they interacted [132].
Computational methods for analyzing data from epigenomic studies are being continually developed and becoming ever more sophisticated; they have been used to identify functional genomic elements and to determine gene structures and cis-regulatory elements.
For example, Hon et al. [133] described a statistical program called ChromaSig with the capacity to identify commonly occurring chromatin signatures from histone modification data.
They demonstrated the potential utility of the algorithm in data from HeLa cells by identifying five clusters of chromatin signatures associated with transcriptional promoters and enhancers. Thus, through use of ChromaSig, chromatin signatures associated with specific biological functions were identified.
2.8 General Overview and Future Perspective #
Over the last decade, the technologies available to study the mechanisms and consequences of epigenetic modifications have increased exponentially.
The stimulus for this has been the rapid increase in our understanding and appreciation of the importance of epigenetic changes on phenotypes and in the etiology of diseases.
Technological advances now enable large-scale epigenomic analyses. The first whole-genome, high-resolution maps of epigenetic modifica tions have been produced, but there is clearly much more to do.
Detailed maps of the human methylome, histone modifications and nucleosome positions in healthy and diseased tissues are still needed.
This review section has attempted to provide an overview of the currently available techniques and to discuss some of the advantages and limitations of each technology.
With the rapid growth in interest in understanding the epigenetic regulation of disease development, a variety of new and improved methodologies are certain to emerge in the coming years.
These technologies will undoubtedly change the scope of epigenetic studies and will provide valuable new insights into the developmental basis of diseases and into reproductive toxicology.
Particularly, in future, the influence and utilization of NGS technologies will find widespread use and relevance in many different areas of biology, far beyond the test-bed of epigenetics.
Here, we outline a promising strategy for epigenome investigation that combines several of the epigenetic methods described above (Figures 2.2A, 2.2B).
The early/late-switch regions of Replication timing generally correspond to chromosomal zones with transitions in relative GC content; they are also correlated to R/G chromosome band boundaries, and are suspected of being “unstable” genomic regions that have increased susceptibility to epigenetic mutation and DNA damage [5,99].
There is a clear need for further epigenomic analysis on chromosomal band structures, in particular, to obtain a greater understanding of these epimutation-sensitive regions at the genome sequence level.
Before performing epigenomic analysis using DNA methylation and histone modification, the direct correspondence between chromosomal band (R-, T-, and G-bands) and genome sequence should be elucidated by analyzing DNA replication timing and GC%, etc. (Figures 2.2A, 2.2B).
Finally, we suggest that epigenomic analysis focused on chromosomal band structures, the boundaries of which were identified as epimutation-sensitive genomic regions at the genome sequence level, will provide considerable insights into normal and disease conditions.
Chap3. DNA Methylation Alteration in Human Cancer #
Chap4. Alterations of Histone Modifications in Cancer #
Chap5. MicroRNA in Oncogenesis #
Chap6. Epigenetics Approaches to Cancer Therapy #
6.1 Introduction #
6.2 Histone Acetylation #
6.3 Histone Deacetylases #
6.4 Histone Methylation and Demethylation #
6.5 DNA Methylation #
6.6 Acetylation of Non-Histone Proteins #
6.7 Future Directions #
Suggested Pages #
- 0.025 NetworkX
- 0.025 HIST2H2AC
- 0.025 HIST1H2BJ
- 0.025 May 9
- 0.025 DNMT3b
- 0.025 February 22
- 0.025 H2AFV
- 0.025 ngs.plot
- 0.025 Dacogen
- 0.025 BMC Bioinformatics
- More suggestions...