You are on page 1of 7

International HapMap Project

The International HapMap Project was an organization that aimed to develop a haplotype map
(HapMap) of the human genome, to describe the common patterns of human genetic variation. HapMap is
used to find genetic variants affecting health, disease and responses to drugs and environmental factors. The
information produced by the project is made freely available for research.

The International HapMap Project is a collaboration among researchers at academic centers, non-profit
biomedical research groups and private companies in Canada, China (including Hong Kong), Japan,
Nigeria, the United Kingdom, and the United States. It officially started with a meeting on October 27 to
29, 2002, and was expected to take about three years. It comprises two phases; the complete data obtained
in Phase I were published on 27 October 2005.[1] The analysis of the Phase II dataset was published in
October 2007.[2] The Phase III dataset was released in spring 2009 and the publication presenting the final
results published in September 2010.[3]

Background
Unlike with the rarer Mendelian diseases, combinations of different genes and the environment play a role
in the development and progression of common diseases (such as diabetes, cancer, heart disease, stroke,
depression, and asthma), or in the individual response to pharmacological agents.[4] To find the genetic
factors involved in these diseases, one could in principle do a genome-wide association study: obtain the
complete genetic sequence of several individuals, some with the disease and some without, and then search
for differences between the two sets of genomes. At the time, this approach was not feasible because of the
cost of full genome sequencing. The HapMap project proposed a shortcut.

Although any two unrelated people share about 99.5% of their DNA sequence, their genomes differ at
specific nucleotide locations. Such sites are known as single nucleotide polymorphisms (SNPs), and each
of the possible resulting gene forms is called an allele.[5] The HapMap project focuses only on common
SNPs, those where each allele occurs in at least 1% of the population.

Each person has two copies of all chromosomes, except the sex chromosomes in males. For each SNP, the
combination of alleles a person has is called a genotype. Genotyping refers to uncovering what genotype a
person has at a particular site. The HapMap project chose a sample of 269 individuals and selected several
million well-defined SNPs, genotyped the individuals for these SNPs, and published the results.[6]

The alleles of nearby SNPs on a single chromosome are correlated. Specifically, if the allele of one SNP for
a given individual is known, the alleles of nearby SNPs can often be predicted, a process known as
genotype imputation.[7] This is because each SNP arose in evolutionary history as a single point mutation,
and was then passed down on the chromosome surrounded by other, earlier, point mutations. SNPs that are
separated by a large distance on the chromosome are typically not very well correlated, because
recombination occurs in each generation and mixes the allele sequences of the two chromosomes. A
sequence of consecutive alleles on a particular chromosome is known as a haplotype.[8]

To find the genetic factors involved in a particular disease, one can proceed as follows. First a certain
region of interest in the genome is identified, possibly from earlier inheritance studies. In this region one
locates a set of tag SNPs from the HapMap data; these are SNPs that are very well correlated with all the
other SNPs in the region. Using these, genotype imputation can be used to determine (impute) the other
SNPs and thus the entire haplotype with high confidence. Next, one determines the genotype for these tag
SNPs in several individuals, some with the disease and some without. By comparing the two groups, one
determines the likely locations and haplotypes that are involved in the disease.

Samples used
Haplotypes are generally shared between populations, but their frequency can differ widely. Four
populations were selected for inclusion in the HapMap: 30 adult-and-both-parents Yoruba trios from
Ibadan, Nigeria (YRI), 30 trios of Utah residents of northern and western European ancestry (CEU), 44
unrelated Japanese individuals from Tokyo, Japan (JPT) and 45 unrelated Han Chinese individuals from
Beijing, China (CHB). Although the haplotypes revealed from these populations should be useful for
studying many other populations, parallel studies are currently examining the usefulness of including
additional populations in the project.

All samples were collected through a community engagement process with appropriate informed consent.
The community engagement process was designed to identify and attempt to respond to culturally specific
concerns and give participating communities input into the informed consent and sample collection
processes.[9]

In phase III, 11 global ancestry groups have been assembled: ASW (African ancestry in Southwest USA);
CEU (Utah residents with Northern and Western European ancestry from the CEPH collection); CHB
(Han Chinese in Beijing, China); CHD (Chinese in Metropolitan Denver, Colorado); GIH (Gujarati Indians
in Houston, Texas); JPT (Japanese in Tokyo, Japan); LWK (Luhya in Webuye, Kenya); MEX (Mexican
ancestry in Los Angeles, California); MKK (Maasai in Kinyawa, Kenya); TSI (Tuscans in Italy); YRI
(Yoruba in Ibadan, Nigeria).[10]
Phase ID Place Population Detail

Utah residents with Northern and


Detail (https://catalog.coriell.org/1/NIGMS/Coll
I/II CEU Western European ancestry from the
ections/CEPH-Resources)
CEPH collection

Detail (https://catalog.coriell.org/1/NHGRI/Coll
I/II CHB Han Chinese in Beijing, China ections/HapMap-Collections/Han-Chinese-in-B
eijing-China-CHB)
Detail (https://catalog.coriell.org/1/NHGRI/Coll
I/II JPT Japanese in Tokyo, Japan ections/HapMap-Collections/Japanese-in-Toky
o-Japan-JPT)

Detail (https://catalog.coriell.org/1/NHGRI/Coll
I/II YRI Yoruba in Ibadan, Nigeria ections/HapMap-Collections/Yoruba-in-Ibadan-
Nigeria-YRI)

Detail (https://catalog.coriell.org/1/NHGRI/Coll
African ancestry in the Southwest
III ASW ections/HapMap-Collections/African-Ancestry-
USA
in-SW-USA-ASW)
Detail (https://catalog.coriell.org/1/NHGRI/Coll
Chinese in metropolitan Denver, CO,
III CHD ections/HapMap-Collections/Chinese-in-Metro
United States
politan-Denver-CO-USA-CHD)

Detail (https://catalog.coriell.org/1/NHGRI/Coll
Gujarati Indians in Houston, TX,
III GIH ections/HapMap-Collections/Gujarati-Indians-i
United States
n-Houston-TX-USA-GIH)

Detail (https://catalog.coriell.org/1/NHGRI/Coll
III LWK Luhya in Webuye, Kenya ections/HapMap-Collections/Luhya-in-Webuye
-Kenya-LWK)
Detail (https://catalog.coriell.org/1/NHGRI/Coll
III MKK Maasai in Kinyawa, Kenya ections/HapMap-Collections/Maasai-in-Kinyaw
a-Kenya-MKK)

Detail (https://catalog.coriell.org/1/NHGRI/Coll
Mexican ancestry in Los Angeles,
III MXL ections/HapMap-Collections/Mexican-Ancestr
CA, United States
y-in-Los-Angeles-CA-USA-MXL)

Detail (https://catalog.coriell.org/1/NHGRI/Coll
III TSI Toscani in Italia ections/HapMap-Collections/Toscani-in-Italia-
TSI)

Three combined panels have also been created, which allow better identification of SNPs in groups outside
the nine homogenous samples: CEU+TSI (Combined panel of Utah residents with Northern and Western
European ancestry from the CEPH collection and Tuscans in Italy); JPT+CHB (Combined panel of
Japanese in Tokyo, Japan and Han Chinese in Beijing, China) and JPT+CHB+CHD (Combined panel of
Japanese in Tokyo, Japan, Han Chinese in Beijing, China and Chinese in Metropolitan Denver, Colorado).
CEU+TSI, for instance, is a better model of UK British individuals than is CEU alone.[10]

Scientific strategy
It was expensive in the 1990s to sequence patients’ whole genomes. So the National Institutes of Health
embraced the idea for a "shortcut", which was to look just at sites on the genome where many people have
a variant DNA unit. The theory behind the shortcut was that, since the major diseases are common, so too
would be the genetic variants that caused them. Natural selection keeps the human genome free of variants
that damage health before children are grown, the theory held, but fails against variants that strike later in
life, allowing them to become quite common (In 2002 the National Institutes of Health started a $138
million project called the HapMap to catalog the common variants in European, East Asian and African
genomes).[11]

For the Phase I, one common SNP was genotyped every 5,000 bases. Overall, more than one million SNPs
were genotyped. The genotyping was carried out by 10 centres using five different genotyping
technologies. Genotyping quality was assessed by using duplicate or related samples and by having
periodic quality checks where centres had to genotype common sets of SNPs.

The Canadian team was led by Thomas J. Hudson at McGill University in Montreal and focused on
chromosomes 2 and 4p. The Chinese team was led by Huanming Yang in Beijing and Shanghai, and Lap-
Chee Tsui in Hong Kong and focused on chromosomes 3, 8p and 21. The Japanese team was led by
Yusuke Nakamura at the University of Tokyo and focused on chromosomes 5, 11, 14, 15, 16, 17 and 19.
The British team was led by David R. Bentley at the Sanger Institute and focused on chromosomes 1, 6,
10, 13 and 20. There were four United States' genotyping centres: a team led by Mark Chee and Arnold
Oliphant at Illumina Inc. in San Diego (studying chromosomes 8q, 9, 18q, 22 and X), a team led by David
Altshuler and Mark Daly at the Broad Institute in Cambridge, USA (chromosomes 4q, 7q, 18p, Y and
mitochondrion), a team led by Richard Gibbs at the Baylor College of Medicine in Houston (chromosome
12), and a team led by Pui-Yan Kwok at the University of California, San Francisco (chromosome 7p).

To obtain enough SNPs to create the Map, the Consortium funded a large re-sequencing project to discover
millions of additional SNPs. These were submitted to the public dbSNP database. As a result, by August
2006, the database included more than ten million SNPs, and more than 40% of them were known to be
polymorphic. By comparison, at the start of the project, fewer than 3 million SNPs were identified, and no
more than 10% of them were known to be polymorphic.

During Phase II, more than two million additional SNPs were genotyped throughout the genome by David
R. Cox, Kelly A. Frazer and others at Perlegen Sciences and 500,000 by the company Affymetrix.

Data access
All of the data generated by the project, including SNP frequencies, genotypes and haplotypes, were placed
in the public domain and are available for download.[12] This website also contains a genome browser
which allows to find SNPs in any region of interest, their allele frequencies and their association to nearby
SNPs. A tool that can determine tag SNPs for a given region of interest is also provided. These data can
also be directly accessed from the widely used Haploview program.

Publications
International HapMap Consortium (2003). "The International HapMap Project" (https://deepb
lue.lib.umich.edu/bitstream/2027.42/62838/1/nature02168.pdf) (PDF). Nature. 426 (6968):
789–796. Bibcode:2003Natur.426..789G (https://ui.adsabs.harvard.edu/abs/2003Natur.426..
789G). doi:10.1038/nature02168 (https://doi.org/10.1038%2Fnature02168).
hdl:2027.42/62838 (https://hdl.handle.net/2027.42%2F62838). PMID 14685227 (https://pub
med.ncbi.nlm.nih.gov/14685227). S2CID 4387110 (https://api.semanticscholar.org/CorpusI
D:4387110).
International HapMap Consortium (2004). "Integrating ethics and science in the International
HapMap Project" (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2271136). Nature Reviews
Genetics. 5 (6): 467–475. doi:10.1038/nrg1351 (https://doi.org/10.1038%2Fnrg1351).
PMC 2271136 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2271136). PMID 15153999
(https://pubmed.ncbi.nlm.nih.gov/15153999).
International HapMap Consortium (2005). "A haplotype map of the human genome" (https://
www.ncbi.nlm.nih.gov/pmc/articles/PMC1880871). Nature. 437 (7063): 1299–1320.
Bibcode:2005Natur.437.1299T (https://ui.adsabs.harvard.edu/abs/2005Natur.437.1299T).
doi:10.1038/nature04226 (https://doi.org/10.1038%2Fnature04226). PMC 1880871 (https://w
ww.ncbi.nlm.nih.gov/pmc/articles/PMC1880871). PMID 16255080 (https://pubmed.ncbi.nlm.
nih.gov/16255080).
International HapMap Consortium (2007). "A second generation human haplotype map of
over 3.1 million SNPs" (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2689609). Nature.
449 (7164): 851–861. Bibcode:2007Natur.449..851F (https://ui.adsabs.harvard.edu/abs/200
7Natur.449..851F). doi:10.1038/nature06258 (https://doi.org/10.1038%2Fnature06258).
PMC 2689609 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2689609). PMID 17943122
(https://pubmed.ncbi.nlm.nih.gov/17943122).
International HapMap 3 Consortium (2010). "Integrating common and rare genetic variation
in diverse human populations" (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3173859).
Nature. 467 (7311): 52–58. Bibcode:2010Natur.467...52T (https://ui.adsabs.harvard.edu/abs/
2010Natur.467...52T). doi:10.1038/nature09298 (https://doi.org/10.1038%2Fnature09298).
PMC 3173859 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3173859). PMID 20811451
(https://pubmed.ncbi.nlm.nih.gov/20811451).
Deloukas P, Bentley D (2004). "The HapMap project and its application to genetic studies of
drug response" (https://doi.org/10.1038%2Fsj.tpj.6500226). The Pharmacogenomics
Journal. 4 (2): 88–90. doi:10.1038/sj.tpj.6500226 (https://doi.org/10.1038%2Fsj.tpj.6500226).
PMID 14676823 (https://pubmed.ncbi.nlm.nih.gov/14676823).
Thorisson GA, Smith AV, Krishnan L, Stein LD (2005). "The International HapMap Project
Web site" (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1310647). Genome Research. 15
(11): 1592–1593. doi:10.1101/gr.4413105 (https://doi.org/10.1101%2Fgr.4413105).
PMC 1310647 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1310647). PMID 16251469
(https://pubmed.ncbi.nlm.nih.gov/16251469).
Terwilliger JD, Hiekkalinna T (2006). "An utter refutation of the 'Fundamental Theorem of the
HapMap' " (https://doi.org/10.1038%2Fsj.ejhg.5201583). European Journal of Human
Genetics. 14 (4): 426–437. doi:10.1038/sj.ejhg.5201583 (https://doi.org/10.1038%2Fsj.ejhg.
5201583). PMID 16479260 (https://pubmed.ncbi.nlm.nih.gov/16479260).
Secko, David (2005). "Phase I of the HapMap Complete" (http://www.the-scientist.com/news/
20051026/01) Archived (https://web.archive.org/web/20110514112054/http://www.the-scienti
st.com/news/20051026/01/) 2011-05-14 at the Wayback Machine. The Scientist

See also
Genealogical DNA test
The 1000 Genomes Project
Population groups in biomedicine
Human Variome Project
Human genetic variation

References
1. Altshuler, David; Donnelly, Peter; The International HapMap Consortium (October 2005). "A
haplotype map of the human genome" (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1880
871). Nature. 437 (7063): 1299–1320. Bibcode:2005Natur.437.1299T (https://ui.adsabs.harv
ard.edu/abs/2005Natur.437.1299T). doi:10.1038/nature04226 (https://doi.org/10.1038%2Fna
ture04226). ISSN 1476-4687 (https://www.worldcat.org/issn/1476-4687). PMC 1880871 (http
s://www.ncbi.nlm.nih.gov/pmc/articles/PMC1880871). PMID 16255080 (https://pubmed.ncbi.
nlm.nih.gov/16255080).
2. Frazer, Kelly A.; Ballinger, Dennis G.; Cox, David R.; Hinds, David A.; Stuve, Laura L.; Gibbs,
Richard A.; Belmont, John W.; Boudreau, Andrew; Hardenbol, Paul; Leal, Suzanne M.;
Pasternak, Shiran (October 2007). "A second generation human haplotype map of over 3.1
million SNPs" (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2689609). Nature. 449 (7164):
851–861. Bibcode:2007Natur.449..851F (https://ui.adsabs.harvard.edu/abs/2007Natur.449..
851F). doi:10.1038/nature06258 (https://doi.org/10.1038%2Fnature06258).
hdl:2027.42/62863 (https://hdl.handle.net/2027.42%2F62863). ISSN 1476-4687 (https://ww
w.worldcat.org/issn/1476-4687). PMC 2689609 (https://www.ncbi.nlm.nih.gov/pmc/articles/P
MC2689609). PMID 17943122 (https://pubmed.ncbi.nlm.nih.gov/17943122).
3. Altshuler, David M.; Gibbs, Richard A.; Peltonen, Leena; Altshuler, David M.; Gibbs, Richard
A.; Peltonen, Leena; Dermitzakis, Emmanouil; Schaffner, Stephen F.; Yu, Fuli; Peltonen,
Leena; Dermitzakis, Emmanouil (September 2010). "Integrating common and rare genetic
variation in diverse human populations" (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC317
3859). Nature. 467 (7311): 52–58. Bibcode:2010Natur.467...52T (https://ui.adsabs.harvard.e
du/abs/2010Natur.467...52T). doi:10.1038/nature09298 (https://doi.org/10.1038%2Fnature09
298). ISSN 1476-4687 (https://www.worldcat.org/issn/1476-4687). PMC 3173859 (https://ww
w.ncbi.nlm.nih.gov/pmc/articles/PMC3173859). PMID 20811451 (https://pubmed.ncbi.nlm.ni
h.gov/20811451).
4. Crouch, Daniel J. M.; Bodmer, Walter F. (11 August 2020). "Polygenic inheritance, GWAS,
polygenic risk scores, and the search for functional variants" (https://www.ncbi.nlm.nih.gov/p
mc/articles/PMC7431089). Proceedings of the National Academy of Sciences. 117 (32):
18924–18933. doi:10.1073/pnas.2005634117 (https://doi.org/10.1073%2Fpnas.200563411
7). PMC 7431089 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7431089).
PMID 32753378 (https://pubmed.ncbi.nlm.nih.gov/32753378).
5. "Allele" (https://www.genome.gov/genetics-glossary/Allele). Genome.gov. National Human
Genome Research Institute.
6. The International HapMap Consortium (December 2003). "The International HapMap
Project" (https://doi.org/10.1038%2Fnature02168). Nature. 426 (6968): 789–796.
doi:10.1038/nature02168 (https://doi.org/10.1038%2Fnature02168). hdl:2027.42/62838 (http
s://hdl.handle.net/2027.42%2F62838). PMID 14685227 (https://pubmed.ncbi.nlm.nih.gov/14
685227). S2CID 8151693 (https://api.semanticscholar.org/CorpusID:8151693).
7. Deng, Tianyu; Zhang, Pengfei; Garrick, Dorian; Gao, Huijiang; Wang, Lixian; Zhao, Fuping
(2022). "Comparison of Genotype Imputation for SNP Array and Low-Coverage Whole-
Genome Sequencing Data" (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8762119).
Frontiers in Genetics. 12: 704118. doi:10.3389/fgene.2021.704118 (https://doi.org/10.3389%
2Ffgene.2021.704118). PMC 8762119 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8762
119). PMID 35046990 (https://pubmed.ncbi.nlm.nih.gov/35046990).
8. "Haplotype" (https://www.genome.gov/genetics-glossary/haplotype). Genome.gov. National
Human Genome Research Institute. Retrieved 25 June 2022.
9. Rotimi, Charles; Leppert, Mark; Matsuda, Ichiro; Zeng, Changqing; Zhang, Houcan;
Adebamowo, Clement; Ajayi, Ike; Aniagwu, Toyin; Dixon, Missy; Fukushima, Yoshimitsu;
Macer, Darryl (2007). "Community Engagement and Informed Consent in the International
HapMap Project" (https://www.karger.com/Article/FullText/101761). Public Health Genomics.
10 (3): 186–198. doi:10.1159/000101761 (https://doi.org/10.1159%2F000101761).
ISSN 1662-4246 (https://www.worldcat.org/issn/1662-4246). PMID 17575464 (https://pubme
d.ncbi.nlm.nih.gov/17575464). S2CID 10844405 (https://api.semanticscholar.org/CorpusID:1
0844405).
10. International HapMap consortium et al. (2010). Integrating common and rare genetic
variation in diverse human populations. Nature, 467, 52-8. doi (https://dx.doi.org/10.1038/nat
ure09298)
11. Naidoo N, Pawitan Y, Soong R, Cooper DN, Ku CS (October 2011). "Human genetics and
genomics a decade after the release of the draft sequence of the human genome" (https://w
ww.ncbi.nlm.nih.gov/pmc/articles/PMC3525251). Human Genomics. 5 (6): 577–622.
doi:10.1186/1479-7364-5-6-577 (https://doi.org/10.1186%2F1479-7364-5-6-577).
PMC 3525251 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3525251). PMID 22155605
(https://pubmed.ncbi.nlm.nih.gov/22155605).
12. Thorisson, Gudmundur A.; Smith, Albert V.; Krishnan, Lalitha; Stein, Lincoln D. (2005-11-01).
"The International HapMap Project Web site" (http://genome.cshlp.org/content/15/11/1592).
Genome Research. 15 (11): 1592–1593. doi:10.1101/gr.4413105 (https://doi.org/10.1101%2
Fgr.4413105). ISSN 1088-9051 (https://www.worldcat.org/issn/1088-9051). PMC 1310647 (h
ttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC1310647). PMID 16251469 (https://pubmed.nc
bi.nlm.nih.gov/16251469).

External links
International HapMap Project (HapMap Homepage) (http://www.hapmap.org/) Archived (http
s://web.archive.org/web/20140416084248/http://www.hapmap.org/) 2014-04-16 at the
Wayback Machine
National Human Genome Research Institute (NHGRI) HapMap Page (http://www.genome.g
ov/10001688)
Browsing HapMap Data Using the Genome Browser (http://www.cshprotocols.org/cgi/conten
t/full/2008/8/pdb.prot5023)
The Mexican Genome Diversity Project (https://archive.today/20100918023309/http://diversit
y.inmegen.gob.mx/gbrowse/cgi-bin/gbrowse/inmegen_diversity/)

Retrieved from "https://en.wikipedia.org/w/index.php?title=International_HapMap_Project&oldid=1139928797"

You might also like