•The complete genetic material of an entire organism is known as its genome.

In 1986, scientists proposed a project to make a genetic map or catalogue, of a prototypical human, including the chromosomal location of all human genes and the complete DNA sequence of the genome

Many scientists and physicians think that many medical and other benefits could flow from knowing the location and sequence of all the genes.

Such knowledge would facilitate locating genes that are associated with diseases and disease susceptibility. It will also make possible the development of drugs that are much more specifically tailored to block particular molecules. This effort has become known as the Human Genome Project.

The Human Genome Project was launched in the fall of 1989, and James Watson, co-discoverer of the double helical structure of DNA, was appointed as the first director.  Watson stated his belief that the human genome project would tell us what it means to be human

The main goal of The Human Genome Project was to read, letter by letter, the three billion bases of human DNA. Before starting to sequence the human genome, scientists built maps of the chromosomes and developed and refined techniques for analyzing DNA. With the tools in place, project scientists began largescale DNA sequencing in 1999. In just one year, they had amassed sequence data covering more than 80 percent of the genome.

The human genome is a massive text. If the three billion letters (or bases) of the genome were printed in telephone books, they would require a stack of books nearly as tall as the Washington monument

To accurately determine the sequence of every base in the genome, scientists needed to read the three billion bases not just once, but at least six to ten times. Individual sequencing reactions could only reveal the order of a few hundred bases of DNA at a time - amounting to a fraction of a page. This meant that to place in order all of the DNA bases, it was necessary to produce many thousands of overlapping segments of DNA sequence.


To begin the project, researchers built maps of the human genome. They identified thousands of DNA sequence landmarks that helped them navigate across the chromosomes. Developing genome maps was necessary preparation for DNA sequencing. These same maps also served to orient geneticists who were hunting for disease genes.

During the Human Genome Project, every base pair of DNA was sequenced an average of nine times. Some stretches of DNA were easy to read and needed to be sequenced little less often, while other stretches were more difficult to read and had to be sequenced more often.

Because the amount of DNA in even one chromosome is enormous, it is not practical to work with the whole length of a chromosome in determining sequenses

The maximum size of pieces that can be sequenced is currently about 500-700 base long. The chromosomes are therefore separated and each is cut into overlapping pieces with restriction enzymes.

Each piece is inserted into a plasmid which enters bacterium. The bacteria then divide repeatedly and make large quantities of one piece at a time.

The nucleotide sequence of each of the pieces can then be determined using the establshed method called ( di-deoxy method) based on DNA

The DNA is used as a template for synthesis of new DNA strands in a test tube.

The overall result is the production of a series of smaller pieces, each piece one nucleotide longer than the next

Each of the small pieces is then separated by electrophoresis . The pieces are made visible with a fluorescent dye, a different color is used for each of the four nucleotides

Fluorescent dyes make all the pieces visible that end with that nucleotide.

Mistakes can occur in either copying or sequencing, and repeating the process does not always give the same answer, so the technique must be repeated several times in different laboratories

After the sequence of each piece has been determined, the pieces must be arranged in their original order to get the overall sequence

The sequence of bases in the DNA fragment can thus be read from the gel: the base found at the end of the shortest piece is first ( traveled farthest in the gel), followed by the base found at the end of the next longer piece ( traveled the second farthest in the gel), and so forth.

 

As each DNA fragment reaches the end of the gel, a laser excites its fluorescent dye. A camera detects the color of the emitted light and passes that information to a computer. One by one, the machine records the colors of the DNA fragments that pass through the gel. A single sequencing reaction can reveal the order of several hundred DNA bases.

Computer program integrates the data from individual sequencing reactions. It can spot where DNA fragments overlap and order them as they originally were on the chromosome. Many overlapping sequences reads are needed to generate the uninterrupted sequence of the original stretch of DNA.

Whenever a stretch of DNA that spanned 2,000 or more bases was assembled, it was placed into public databases within 24 hours. Anyone with access to the Internet could see and analyze the sequence data. After sequencing the 3 billion letters in the human genome an average of nine times, the Human Genome Project had released DNA sequence for 99 percent of the genome. This finished sequence was 99.99 percent accurate. The project had completed all of its goals ahead of schedule and under budget.

Completion of the draft sequence supported some previously established hypotheses, but also produced some surprises. Some key results are:

 

In February 2001 two groups simultaneously announced completion of draft of the sequence of the human genome. Some key results are: About 95% of the human genome represents noncoding DNA, a large proportion of which is composed of repetitive sequences.

Less than 5% of the human genome is composed of genes, sequences that code for RNAs or proteins. It has been known for a while that the complexity of organism does not correlate with the size of its genome. Much of the excess size is due to these non-coding, repeat sequences.

Detailed knowledge of these sequences is opening up a new resource for studying evolution. These sequences can be linked to living fossils, carried within each of us. They are already used in population genetics studies examining the migration of human populations.

The actual number of genes is smaller than previously estimated. In humans it is difficult to predict which sequences represent genes.

There are estimated 20,000–25,000 human protein-coding genes. The estimate of the number of human genes has been repeatedly revised down from initial predictions of 100,000 or more as genome sequence quality and gene finding methods have improved, and could continue to drop further.

Surprisingly, the number of human genes seems to be less than that of many much simpler organisms, such as the roundworm and the fruit fly. However, human cells make extensive use of alternative splicing to produce several different proteins from a single gene, and the human proteome is thought to be much larger than those of the abovementioned organisms. Besides, most human genes have multiple exons, and human introns are frequently much longer than the flanking exons.

A very high percentage of our genes are not unique to humans but are closely similar to comparable genes from other species. In fact, only 1% of human genes have no sequence similarity to any other organism

Mutation rates differ in different parts of the genome. They are also higher in males than in females, although the reason for such a difference is not known Within each gene there is an average 15 sites at which different individuals carry a different nucleotide on each chromosome in a pair, or at which the same individual may have a different nucleotide on each nucleotide in a pair.

These variations, called single nucleotide polymorphisms are greatly expanding how many alleles are possible for different genes. Some of these polymorphisms are associates with disease; most are not, but are instead associated with small changes in protein function or regulation.

Knowledge of such small scale variations continues to challenge our concepts of terms such as ‘heterozygous’, ‘dominant’ and ‘recessive’ and ‘allele’ It also makes it clear that there is no such thing as the human genome sequence. The genome sequence of within each individual is unique.

This lower estimate came as a shock to many scientists because counting genes was viewed as a way of quantifying genetic complexity. With about 30,000, the human gene count would be only one-third greater than that of the simple roundworm C. elegans, which has about 20,000 genes

It could be years before a truly reliable gene count can be assessed. The reason for so much uncertainty is that predictions are derived from different computational methods and gene-finding programs. Some programs detect genes by looking for distinct patterns that define where a gene begins and ends ("ab initio" gene finding).

Our genes are similar to 46% of the genes in yeast. More than 200 human genes and their protein products have been found to have significant similarity to those in bacteria. These genes are not found in intermediate organisms such as fruitflies and one school of thoughts suggests that these genes jumped from bacteria to humans and vice versa.