The human genome and the chromosomal basis of heredity
Organization of the human genome.
Mitochondrial inheritance. The Human Karyotype. Structure of human chromosomes. Human genetic diversity: Mutation and Polymorphism. Single Nucleotide Polymorphisms; Insertion-Deletion Polymorphisms; Copy Number Variants Inversion Polymorphisms Human Genome Project • The publicly funded Human Genome Project began in 1990 . It established a 15-year plan under the direction of James with a proposed budget of $3 Watson, the co-discoverer of the billion to identify all human double-helix structure of DNA. genes, originally thought to Eventually the public project was number between 80,000 and led by Dr. Francis Collins, who 100,000, to sequence and map had previously led a research team them all, and to sequence the involved in identifying the CFTR approximately 3 billion base gene as the cause of cystic fibrosis. pairs thought to be comprised In the United States, the Collins- by the 24 chromosomes (22 led HGP was coordinated by the autosomes, plus X and Y) in Department of Energy and the humans National Center of Human Genome Research, a division of the National Institutes of Health Organization of the Human Genome • The human genome contains 3.1 billion nucleotides, but protein-coding sequences make up only about 1.5 percent of the genome. • Regulatory elements that influence or determine patterns of gene expression during development or in tissues were believed to account for only approximately 5% of additional sequence, although more recent analyses of chromatin characteristics suggest that a much higher proportion of the genome may provide signals that are relevant to genome functions. • Only approximately half of the total linear length of the genome consists of so-called single- copy or unique DNA, that is, DNA whose linear order of specific nucleotides is represented only once (or at most a few times) around the entire genome. • This concept may appear surprising to some, given that there are only four different nucleotides in DNA. But, consider even a tiny stretch of the genome that is only 10 bases long; with four types of bases, there are over a million possible sequences. • The rest of the genome consists of several classes of repetitive DNA and includes DNA whose nucleotide sequence is repeated, either perfectly or with some variation, hundreds to millions of times in the genome. • Whereas most (but not all) of the estimated 20,000 protein-coding genes in the genome are represented in single-copy DNA, Molecular organization and gene products of mitochondrial DNA • In most eukaryotes, mtDNA exists as a doublestranded, closed circle • In size, mtDNA is much smaller than cpDNA and varies greatly among organisms, as demonstrated in In a variety of animals, including humans, mtDNA consists of about 16,000 to 18,000 bp (16 to 18 kb). • There are several other noteworthy aspects of mtDNA. With only rare exceptions, and unlike cpDNA, introns are absent from mitochondrial genes, and gene repetitions are seldom present. Nor is there usually much in the way of intergenic spacer DNA. This is particularly true in species whose mtDNA is fairly small in size, such as humans. • Human mtDNA encodes two ribosomal RNAs (rRNAs), 22 transfer RNAs (tRNAs), and 13 polypeptides essential to the oxidative respiration functions of the organelle. For instance, mitochondrial-encoded gene products are present in all of the protein complexes of the electron transport chain found in the inner membrane of mitochondria. In most cases, these polypeptides are part of multichain proteins, many of which also contain subunits that are encoded in the nucleus, synthesized in the cytoplasm, and then transported into the organelle. Thus, the protein-synthesizing apparatus and the molecular components for cellular respiration are jointly derived from nuclear and mitochondrial genes. Mutations in mitochondrial DNA The Human Karyotype The condensed chromosomes of a dividing human cell are most readily analyzed at metaphase or prometaphase. At these stages, the chromosomes are visible under the microscope as a so-called chromosome spread; each chromosome consists of its sister chromatids, although in most chromosome preparations, the two chromatids are held together so tightly that they are rarely visible as separate entities. There are 24 different types of human chromosome, each of which can be distinguished cytologically by a combination of overall length, location of the centromere, and sequence content, the latter reflected by various staining methods. The centromere is apparent as a primary constriction, a narrowing or pinching-in of the sister chromatids due to formation of the kinetochore. This is a recognizable cytogenetic landmark, dividing the chromosome into two arms, a short arm designated p (for petit) and a long arm designated q Structure of human chromosomes. Nucleosomes The Concept of Mutation • Mutation - ranging from the change of a single nucleotide to alterations of an entire chromosome. To recognize a change means that there has to be a “gold standard” • Mutations are sometimes classified by the size of the altered DNA sequence and, at other times, by the functional effect of the mutation on gene expression. Although classification by size is somewhat arbitrary, it can be helpful conceptually to distinguish among mutations at three different levels: • Mutations that leave chromosomes intact but change the number of chromosomes in a cell (chromosome mutations) • Mutations that change only a portion of a chromosome and might involve a change in the copy number of a subchromosomal segment or a structural rearrangement involving parts of one or more chromosomes (regional or subchromosomal mutations) • Alterations of the sequence of DNA, involving the substitution, deletion, or insertion of DNA, ranging from a single nucleotide up to an arbitrarily set limit of approximately 100 kb (gene or DNA mutations) The Concept of Genetic Polymorphism The DNA sequence of a given region of the genome is remarkably similar among chromosomes carried by many different individuals from around the world. In fact, any randomly chosen segment of human DNA approximately 1000 bp in length contains, on average, only one base pair that is different between the two homologous chromosomes inherited from that individual’s parents (assuming the parents are unrelated). However, across all human populations, many tens of millions of single nucleotide differences and over a million more complex variants have been identified and catalogued. Because of limited sampling, these figures are likely to underestimate the true extent of genetic diversity in our species. Single Nucleotide Polymorphisms • The simplest and most common of all polymorphisms are single nucleotide polymorphisms (SNPs). A locus characterized by a SNP usually has only two alleles, corresponding to the two different bases occupying that particular location in the genome. • SNPs are common and are observed on average once every 1000 bp in the genome. • However, the distribution of SNPs is uneven around the genome; many more SNPs are found in noncoding parts of the genome, in introns and in sequences that are some distance from known genes. Nonetheless, there is still a significant number of SNPs that do occur in genes and other known functional elements in the genome. For the set of protein-coding genes, over 100,000 exonic SNPs have been documented to date. The Concept of Genetic Polymorphism Insertion-Deletion Polymorphisms • A second class of polymorphism is the result of varia- tions caused by insertion or deletion (in/dels or simply indels) of anywhere from a single base pair up to approximately 1000 bp, although larger indels have been documented as well. • Over a million indels have been described, numbering in the hundreds of thousands in any one individual’s genome. • Approximately half of all indels are referred to as “simple” because they have only two alleles—that is, the presence or absence of the inserted or deleted segment Microsatellite Polymorphisms • Other indels, however, are multiallelic due to variable numbers of the segment of DNA that is inserted in tandem at a particular location, thereby constituting what is referred to as a microsatellite. • They consist of stretches of DNA composed of units of two, three, or four nucleotides, such as TGTGTG, CAACAACAA, or AAATAAATAAAT, repeated between one and a few dozen times at a particular site in the genome • The different alleles in a microsatellite polymorphism are the result of differing numbers of repeated nucleotide units contained within any one microsatellite and are therefore sometimes also referred to as short tandem repeat (STR) polymorphisms. Copy Number Variants • Another important type of human polymorphism includes copy number variants (CNVs). • CNVs are conceptually related to indels and microsatellites but consist of variation in the number of copies of larger segments of the genome, ranging in size from 1000 bp to many hundreds of kilobase pairs. • Variants larger than 500 kb are found in 5% to 10% of individuals in the general population, whereas variants encompassing more than 1 Mb are found in 1% to 2%. The largest CNVs are sometimes found in regions of the genome characterized by repeated blocks of homologous sequences called segmental duplications (or segdups). Inversion Polymorphisms • A final group of polymorphisms to be discussed is inversions, which differ in size from a few base pairs to large regions of the genome (up to several megabase pairs) that can be present in either of two orientations in the genomes of different individuals. • Most inversions are characterized by regions of sequence homology at the edges of the inverted segment, implicating a process of homologous recombination in the origin of the inversions. • In their balanced form, inversions, regardless of orientation, do not involve a gain or loss of DNA, and the inversion polymorphisms (with two alleles corresponding to the two orientations) can achieve substantial frequencies in the general population. • However, anomalous recombination can result in the duplication or deletion of DNA located between the regions of homology, associated with clinical disorders.