You are on page 1of 35

HUMAN GENOME

Prof. Dr. ANIL KUMAR


SCHOOL OF BIOTECHNOLOGY
DEVI AHILYA UNIVERSITY
KHANDWA ROAD
INDORE-452001, INDIA
Email: ak_sbt@yahoo.com
INTRODUCTION

• The human genome is the genome of Homo sapiens, which is composed of 23 distinct
pairs of chromosomes (22 autosomal+X+Y) with a total of approximately 3 billion DNA base
pairs containing an estimated 20,000-25,000 genes.

• The Human Genome Project has produced a reference sequence of the euchromatic
human genome, which is used worldwide in biomedical sciences.

• The human genome had fewer genes than expected, with only about 1.5% coding for
proteins, and the rest comprised by RNA genes, Regulatory sequences, introns and
controversially so- called junk DNA.

• In Dec. 1984 during a workshop on current state of mutation detection and


characterization and to project future directions for technologies to tackle the prevailing
technical limitations, scientists discussed about the human genome analysis and this was the
first step towards nucleotide sequencing of the entire human genome. This workshop was
being sponsored by the U.S Department of Energy.
• In the workshop, growing roles of existing DNA technologies especially the emerging
Gene Cloning and Sequencing technologies were discussed. It was realized that existing
technologies are in use for about a decade and mostly individual scientists are engaged in
cloning and characterization of single genes which looked to be wasteful of Human and
Research resources.

• Such methodologies were considered to be incapable of determining mutations with good


sensitivity. Scientists thought that an exhaustive, complex and expansive project for
complete nucleotide sequencing of the human genome should be undertaken.

• An idea for a dedicated human genome project by the U.S Department of Energy (DAE)
was initiated by the report on technologies for detecting heritable mutations in human
beings.

• In 1986, an international meeting in Mexico to assess the desirability and feasibility of


ordering and sequencing DNA clones representing the entire human genome was sponsored
by DAE.
FEATURES

CHROMOSOMES

• There are 24 distinct human chromosomes with : 22 autosomal chromosomes, plus the
sex determining X and Y chromosomes.

• Chromosomes 1-22 are numbered roughly in order of decreasing size.

• Somatic cells usually have one copy of chromosomes 1-22 from each parent , plus an X
chromosome from the mother, and either an X or Y chromosome from the father, for a total
of 46.

GENES

• There are an estimated 20,000-25,000 human protein coding genes.


• The number of human genes seems to be less than a factor of two greater than that of
many such simpler organisms, such as the roundworm and fruit fly.

• Most human genes have multiple exons, and human introns are frequently much longer
than the flanking exons.

REGULATORY SEQUENCES

• The human genome has many different regulatory sequences which are crucial to
controlling gene expression.

• Identification of regulatory sequences relies in part on evolutionary conservation. The


evolutionary branch between the human and mouse, for example, occurred 70-90 million
years ago. So computer comparisons of gene sequences that identify conserved non-coding
sequences will be an identification of their importance in duties such as gene regulation.
• Another comparative genomic approach to locating regulatory sequences in humans is
the gene sequencing of the puffer fish. These vertebrates have essentially the same genes
and regulatory gene sequences as humans, but with only one-eighth the “junk” DNA. The
compact DNA sequence of the puffer fish makes it much easier to locate the regulatory
genes.

OTHER DNA

• Protein coding sequences comprise less than 1.5% of the human genome.

• Aside from genes and known regulatory sequences, the human genome contains vast
regions of DNA the function of which is not known.

• These regions comprise the vast majority, by some estimates 97%, of the human
genome size.

Much of this is comprised of:-


REPEAT ELEMENTS

• Tandem repeats:
Satellite DNA
Minisatellite
Microsatellite

• Interspersed repeats:
SINEs
LINEs

TRANSPOSONS

• Retrotransposons
1. LTR
(a) Ty1-copia
(b) Ty3-gypsy
2. Non- LTR
(a) SINEs
(b) LINEs

• DNA Transposons

PSEUDOGENES

• There is large amount of sequence that does not fall under any known classification.

• Recent experiments using microarrays have revealed that a substantial fraction of non-
genic DNA is in fact transcribed into RNA, which leads to the possibility that the resulting
transcripts may have some unknown function.
GENETIC DISORDERS

• The genetic disorders are caused by abnormal expression of one or more genes that
matches a clinical phenotype.

• The disorder may be caused by a gene mutation, an abnormal number of


chromosomes, or triplet expansion repeat mutations. Defective genes can be inherited
from the parents, in which case it is known as a hereditary disease.

• There are around 4,000 known genetic disorders, with the most common being cystic
fibrosis.

• Studies of genetic disorders is often performed by means of population genetics.

• Treatment is performed by a geneticist- physician trained in clinical genetics.

• The results of Human Genome Project are likely to provide increased availability of
genetic testing for gene related disorders, and eventually improved treatment.
• One major gross effect on human phenotypes derives from gene dosage, whose effects
play a role in disorders caused by duplication, omission, or disruption of chromosomes. For
example, those afflicted with Down syndrome, or trisomy 21, experience high rates of
Alzheimer’s disease, an effect thought to be related to the overexpression of the
Alzheimer’s- related amyloid precursor protein whose gene is located on chromosome 21. By
contrast, Down’s syndrome sufferers experience lower rates of breast cancer, possibly due
to the overexpression of a tumor-suppressor gene.
HUMAN GENOME PROJECT

• The Human Genome Project (HGP) is an international scientific research project.

• The project also focused on several other nonhuman organisms such as Escherichia coli,
the fruit fly, and a laboratory mouse.

• It was one of the largest investigational projects in modern science.

• The project began in 1990 initially headed by James D. Watson. A working draft of the
genome was released in 2000 and a complete one in 2003, with further analysis still being
published.

• The mapping of human genes is an important step in the development of medicines and
other aspects of health care.

• The “genome” of any given individual ( except for identical twins and cloned animals ) is
unique; mapping “the human genome” involves sequencing multiple variations of each
gene.
• The project did not study all of the DNA found in human cells; some heterochromatic
areas (about 8% of the total) remain un-sequenced.

• An important feature of HGP was the federal government’s long-standing dedication to the
transfer of technology to the private sector.

• By licensing technologies to private companies and awarding grants for innovative


research, the project catalyzed the multibillion- dollar U.S. biotechnology industry and
fostered the development of new medical applications.
GOALS OF THE HUMAN GENOME PROJECT

Identify all the


approximately 20,000-25,000 Improve tools for data
genes in human DNA. analysis.

Determine the sequences of the Transfer related


3 billion chemical base pairs technologies to the private
that make up human DNA. sector.

Address the ethical, legal,


Store this information in and social issues (ELSI) that
databases. may arise from the project.
Understand the genetic make
up of the human species.
DAE’S THREE MAJOR OBJECTIVES

Generation of refined Expansion of communication


physical maps of human networks and of computational
chromosomes. and database capabilities.

Development of support
technologies and facilities for
human genome research.

• Implementation of this program began with a small number of pilot projects.


OTHER ORGANIZATIONS/COUNTRIES

• In 1988, office of the human genome research was set up with the support from the US
office of Technology Assessment and National Research Council and US National Institutes
of Health (NIH). It was later renamed as the National Center for Human Genome Research.

• In 1988, US congress approved $3 billion for the project and time limit 15 years
commencing from 1991.

• Department Offices of Energy (DOE) funded to number of laboratories like the Lawrence
Livermore National Laboratory, the Los Alamos National Laboratory, the Lawrence Berkeley
National Laboratory.

• Thereafter, the projects were also started by European countries like Germany, France,
Italy, Denmark, The Netherlands, the United Kingdom.
• National Genome Projects were also started by the countries like Australia, Canada,
Japan, Korea, New Zealand. That’s way, it became truly an international project.

• Later, Human Genome Organization was established to coordinate the different national
efforts, facilitate exchange of research data, public debate etc.

• Three centers of Human Genome Organization were established; HUGO Europe


(London); HUGO Americas (Bethesda); and HUGO Pacific (Tokyo).

• Improvement in the DNA sequencing technology became dramatic so that DNA


sequencing costs have fallen effectively however without increase in the sequencing
efficiency.
HUMAN GENETIC MAPS

• Classic genetic maps for experimental organisms for example, mouse, Drosophila were
available since several years and have been refined successively.

• Genetic maps were prepared by crossing different mutants to determine whether the
two gene loci are linked or not.

• In human, preparation of genetic map was considered to be based on polymorphic


markers which were not necessarily related to disease or to genes. The genetic markers
were protein polymorphisms, particularly blood group and serum protein markers.

• In 1980’s, construction of the human genetic maps was considered using restriction
fragment length polymorphisms (RFLP’s) solving the problem.

• In 1987, there was first report of human genetic map based on the use of 403
polymorphic loci including 393 RFLP markers.
• Afterwards, high resolution genetic maps have been constructed through the use of
microsatellite markers. These markers were more polymorphic than RFLP markers.

• Microsatellites are distributed near the telomeres and not widely distributed
throughout the genome. These microsatellites also called as short tandem repeat
polymorphisms or SRTPs, are abundant.

• In 1992, second human genetic map was published selecting polymorphic CA/TG
repeats, mapping them to specific chromosomes by typing panels of human- rodent
somatic cell hybrids and performing statistical linkage analyses on markers from
individual chromosomes.

• A total 813 markers were organized into 23 linkage groups.

• Afterwards, maps have been produced with increasing numbers of genetic markers,
especially the microsatellite markers and ever increasing resolution. Many different
genetic maps have been constructed using different marker sets.
• All the genetic maps have not used the same sets of reference families and relationship
between different markers used in genetic maps is not clearly known.

• Nowadays, the most widely distributed set of reference families is that deriving from the
Centre d’ Eutdes du Polumorphisme Humaine (CEPH) collaboration of more than 100
independent laboratories. These days integrated genetic maps are prepared.

Advantages of Genetic Maps

• The Genetic maps helped in studying the nature of recombination in humans.

• Besides, it also helped for determination of gene localization.

• The genetic maps also helped in gene cloning.

• The linkage analysis are also helpful in prenatal or presymptomatic diagnosis of inherited
disease genes by having many DNA markers in the vicinity of the disease locus.
HUMAN PHYSICAL MAPS

• A physical map will consist of 24 component maps corresponding to 24 chromosomes.

• The first human physical map was obtained using cytogenetic banding technique which
distinguished not only the different chromosomes but also discrimination of different sub-
chromosomal regions.

• Scientists associated with public Human Genome Project and Celera Genomics published
sequences of genome DNA in human.

NATURE ( Feb. 15, 2001) ; SCIENCE (Feb. 16,2001)


www.ornl.gov/hgmis/project/journals/journals.html

• Sequence is magnificant and unprecedented resource and is basis for research and
discovery throughout this century and beyond.
• Research on Human Genome sequences will have diverse practical applications and impact
upon how we feel ourselves and our place in the tapestry of life around us.

• After the entire sequence, the total number of genes in human genome estimated to be
nearly 25,000. Some others have estimated upto nearly 30,000-35,000. It is much lower
number than the earlier predicted number, nearly 100,000 or even more.

• It is suggested that genetic key to human complexity is not in number of genes but is, how
gene parts are translated to synthesize different protein products.

• The earlier concept of one gene one protein is no more. Now, it is estimated that from one
gene, even there may be synthesis of upto 20 proteins depending upon the age of the human.
It is due to the process of alternative splicing.

• Besides, thousands of post translational chemical modifications are made to proteins, and
regulatory mechanisms controlling these processes add to the complexity.
• In constructing the sequence draft, 16 genome sequencing centers produced over 22.1
billion bases of raw sequence data, comprising overlapping fragments totaling 3.9 billion
bases. The sequences are sequenced seven times. Over 30% data is high quality, finished
sequence with 8-10 fold coverage, 99.99% accuracy and few gaps. All data are freely
available via the web
(www.ornl.gov/hgmis/project/journals/sequencesites.html)

• Final complete human genome sequence was presented on October 22, 2004.

The following points have been highlighted in the human genome sequence

• Human genome contains 3164.7 million nucleotides.

• Average gene has 3000 nucleotides. Size varies much. Largest known human gene is
dystrophin having nearly 2.4 million bases.

• Total number of genes 25000. Much lower number than previously estimated 80,000-
140,0000 number. This had been based on extrapolations from gene rich areas as
opposed to a composite of gene rich and gene poor areas.
• Almost all ( nearly 99.9%) nucleotide bases are exactly the same in all the persons.

• Functions are unknown for over 50% of the discovered genes.

• Less than 2% of the genome codes for proteins.

• Repeated sequences that do not code for any protein make up at least 50% of the
genome. These are called as ‘Junk DNA’.

• Repetitive sequences are thought to have no direct functions but they shed light on
chromosome structure and dynamics. It is considered that these repeats reshape the genome
by rearranging it creating entirely new genes and modifying and reshuffling existing genes.

• During the last 50 million years, a dramatic decrease seems to have occurred in the rate of
accumulation of repeats in the human genome .

• The human genome’s gene dense ‘Urban centers’ are predominantly composed of the DNA
building blocks G and C.
• In contrast, the gene poor ‘Deserts’ are enriched in the DNA building blocks A and T. GC and
AT rich regions usually can be seen through a microscope as light and dark bands on
chromosomes, respectively.

• Genes appear to be concentrated in random areas along the genome with vast expanses of
non-coding DNA in between.

• Stretches of upto 30,000 C and G bases repeating over and over often occur adjacent to
gene rich areas forming a barrier between the genes and the junk DNA. These CG islands are
believed to have role in regulating the gene activity.

• Chromosome 1 has the most genes ( 2968) and the Y chromosome has the fewest (231).

• Unlike the human’s seemingly random distribution of gene rich areas, many other organisms
genomes are more uniform with genes evenly spaced throughout.

• Humans have on an average three times as many kinds of proteins as the fly and the worm.
This is because of mRNA transcript’s alternate splicing and chemical modifications to the
proteins. This process can yield different protein products from the same gene.
• Humans share most of the same protein families with worms, flies and plants. However, the
number of gene family members is expanded in humans especially in proteins involved in
development and immunity.

• Human genome has much greater portion of repeat sequences (50%) than Arabidopsis
(mustard weed) (11%), the worm (7%), and the fly (3%).

• Although humans appear to have stopped accumulating repeated DNA sequences over 50
million years ago, there seems to be no such decline in rodents. This may account for some of
the fundamental differences between hominids and rodents. Although gene estimates are
similar in these species. Scientists have proposed many theories to explain evolutionary
contrasts between humans and other organisms including those of life span, litter sizes,
inbreeding, genetic drift etc.

• In 2003, fine sequences have been submitted.

• As per latest estimate, now total 24847 genes have been predicted in the entire human
genome.
VARIATIONS AND MUTATIONS

• About 1.4 million locations have been identified where single base differences occur in
humans. This information promises to revolutionize the processes of finding chromosomal
locations for disease associated sequences and tracing human history.

• The human sequence project made the biologists acquainted with the importance of single
nucleotide polymorphism (SNP). The SNP’s define differences among people, predispose a
person to disease, and influence a patient’s response to a drug. Now, it is possible to build
microarrays that can target individual SNP variations, as well as making deeper comparisons
across the genome.

• The ratio of germline ( sperm or egg cell ) mutations is 2:1 in males versus females.
Researchers give several reasons for the higher mutation rate in the male germline including
the greater number of cell divisions required for sperm formation than for eggs.
GENE PREDICTIONS

Organism Size (Mb) Year of No. of genes Gene density


sequence

Saccharomyces 12.1 1996 6034 483


cerevisiae

Escherichia coli 4.6 1997 4200 932

Caenorhabditis 97 1998 19099 197


elegans
(roundworm)
Arabidopsis 100 2000 25000 221
thaliana

Drosophila 180 2000 13061 117


melanogaster

Human 3200 2001 25000 12


• Gene predictions are made by computational algorithms based on recognition of gene
sequence features and similarities to known genes. Gene estimates need further confirmation
including characterization of their protein products and functions.

• Gene density = Number of genes per million sequenced DNA bases.


STRATEGIES FOR SEQUENCING THE
HUMAN GENOME

• The strategy originally established by the publicly funded effort (HGP) was based on
localizing bacterial artificial chromosomes (BACs) containing large fragments of human
DNA within the framework of a landmark-based physical map.

• Ideally sequencing would have been done on a clone- by- clone basis, with clones
selected from the minimum BAC tiling path (i.e, a set of BACs that, with minimum overlap,
stretched across the whole length of the genome).

• The working draft, although containing some gaps and ambiguities in order, will be
extremely useful in such efforts as identifying disease-associated genes.

• The idealized strategy of celera was to avoid the up-front mapping phase by subcloning
random fragments of the human genome directly.

• Sequencing of both ends of fragments in libraries of different sizes facilitated ordering.


• While saving effort and time at the beginning, the Celera approach would make the assembly
process much more dependent on algorithms and computer time.

• In their efforts, to reach their goals, the idealized strategies evolved into hybrids in which the
HGP selected more clones arbitrarily and Celera made use of BAC maps and sequence
generated by HGP.
DO IT YOURSELF SCIENCE

• With the sharply falling costs of equipment and wealth of information that is publicly
available, we are getting to the point at which almost anyone with access to the internet and
equipment for sequencing can publish his/her genetic information.

• It is evident from the recent story published in Nature about Hugh Rienhoff, a trained
geneticist and biotechnology entrepreneur, whose daughter was born with a collection of
congenital defects.

• He investigated the genetic cause by himself by buying lab equipments and having her
gene sequenced.

• He posted his theories behind the possible cause of disease and posted information about
her condition and genetic sequence on the internet.

• Besides, the recent release of greatly enhanced haplotype map or HapMap, describes the
most common forms of human genetic variation characterizing over 3.1 million human SNPs
across geographically diverse populations.
• These findings demonstrate the power of genomics to deliver clues that could yield better
medicine and uncovering multiple genes that may be associated with the risk of developing
specific diseases.
APPLICATIONS, FUTURE CHALLENGES

• Deriving meaningful knowledge from the DNA sequence will define research to inform
understanding of biological systems. This task will require expertise and creativity of tens of
thousands of scientists from varied disciplines in both the public and private sectors
worldwide.

• Having this sequence will enable the workers a new approach to biological research. In the
past, researchers studied one or few genes at a time. With whole genome sequences and
new high throughput technologies, they can approach questions systematically and on a
grand scale. They can study all the genes in a genome, for example, or all the transcripts in
a particular tissue or organ or tumor, or how tens of thousands of genes and protein work
together in interconnected networks to orchestrate the chemistry of life.
ACKNOWLEDGEMENTS

For this talk, the matter has been collected from Human Genome
News published by the US Department of Energy Office of biological
and environmental research ( July 2001 issue).

You might also like