You are on page 1of 50



Ville Nikolai Pimenoff

Department of Forensic Medicine
University of Helsinki
Helsinki, Finland

Departament de Ciències de la Salut i de la Vida
Unitat de Biologia Evolutiva
Universitat Pompeu Fabra
Barcelona, Spain

Academic dissertation

To be publicly presented with the permission of the Medical Faculty of the
University of Helsinki, in the lecture hall of the Department of Forensic Medicine
on October 31st 2008 at 12 o’clock noon.

Helsinki 2008

Nikun06.indd 3 24.9.2008 16:41:03

Professor Antti Sajantila
University of Helsinki
Helsinki, Finland

Professor David Comas
Universitat Pompeu Fabra
Barcelona, Spain


Professor Ulf Gyllensten
University of Uppsala
Uppsala, Sweden

Professor Pekka Pamilo
University of Helsinki
Helsinki, Finland


Doctor Lluís Quintana-Murci
Institut Pasteur
Paris, France

ISBN 978-952-92-4331-0 (paperback)
ISBN 978-952-10-4913-2 (pdf)

Helsinki 2008

Nikun06.indd 4 24.9.2008 16:41:07
“We’ll never deal with the devils in the details unless we see the big picture.”

Paul R. Ehrlich
Human Natures—Genes, Cultures,
and the Human Prospect

Nikun06.indd 5 24.9.2008 16:41:07

LIST OF ORIGINAL PUBLICATIONS .............................................................................. 7
ABBREVIATIONS ........................................................................................................... 8
SUMMARY .................................................................................................................. 9
1 REVIEW OF THE LITERATURE ................................................................................... 10
1.1 Introduction...................................................................................................... 10
1.1.1 Genetic characteristics of the Finno-Ugric-speaking population ........... 10
1.1.2 Sampling in human genetic studies and the concept of a population ..... 13
1.2 Organization of the Human Genome .............................................................. 13
1.2.1 General structure .................................................................................... 13
1.2.2 Variation in the Human Genome ............................................................ 14
1.2.3 Population processes shaping genetic variation ..................................... 15
1.2.4 Visualizing genomic variation ................................................................ 17 Molecular markers in human genetic studies ............................ 17 Special characteristics of the uniparental markers .................... 18
1.2.5 Linkage disequilibrium ........................................................................... 20 Processes shaping LD ................................................................ 20 Block structured genome, tagSNPs and LD mapping ............... 21
1.2.6 Human Genome Diversity and the HapMap project .............................. 22
2 AIMS OF THE PRESENT STUDY .............................................................................. 24
3 MATERIALS AND METHODS .................................................................................. 25
3.1 Samples ........................................................................................................... 25
3.2 Molecular data ................................................................................................ 25
3.3 Data analysis ................................................................................................... 26
4 RESULTS AND DISCUSSION .................................................................................... 27
4.1 Uniparental genetic landscape in North Eurasia (I, II) ................................... 27
4.2 Distribution of lactase persistence allele in North Eurasia (III) ..................... 29
4.3 Patterns of LD in CYP2C and CYP2D gene subfamily
regions in Europe (IV) .......................................................................................... 33
5 CONCLUSIONS AND FUTURE PERSPECTIVES ......................................................... 39
6 ACKNOWLEDGEMENTS .......................................................................................... 40
7 REFERENCES ............................................................................................................ 42

Nikun06.indd 6 24.9.2008 16:41:07

This thesis is based on the following original articles, which are referred to in the text by
their Roman numerals. Study III has also been included in Enattah NS (2005) Molecular
Genetics of Lactase Persistence, PhD thesis. University of Helsinki, Finland.

I. Hedman M, Pimenoff V, Lukka M, Sistonen P, Sajantila A (2004) Analysis of 16 Y
STR loci in the Finnish population reveals a local reduction in the diversity of male lin-
eages. Forensic Science International 142(1):37–43.

II. Pimenoff VN, Comas D, Palo JU, Vershubsky G, Kozlov A and Sajantila A (2008)
Northwest Siberian Khanty and Mansi populations in the junction of West and East Eur-
asian gene pools as revealed by uniparental markers. European Journal of Human Genet-
ics advance online publication 28 May 2008 (DOI 10.1038/ejhg.2008.101).

III. Enattah NS, Trudeau A, Pimenoff V, Maiuri L, Auricchio S, Greco L, Rossi M, Len-
tze M, Seo JK, Rahgozar S, Khalil I, Alifrangis M, Natah S, Groop L, Shaat N, Kozlov
A, Verschubskaya G, Comas D, Bulayeva K, Mehdi SQ, Terwilliger JD, Sahi T, Savilahti
E, Perola M, Sajantila A, Jarvela I, Peltonen L (2007) Evidence of still-ongoing conver-
gence evolution of the lactase persistence T-13910 alleles in humans.
American Journal of Human Genetics 81(3):615–25.

IV. Pimenoff VN, Lavall G, Comas D, Palo JU, Gut I, Cann H, Excoffier L and Sajantila
A. Fine-scale recombination and linkage disequilibrium in the CYP2C and CYP2D cy-
tochrome P450 gene subfamily regions in European populations and implications for as-
sociation studies of complex pharmacogenetic traits (submitted).

Additional unpublished data and supplementary material have also been included in this

The original publications have been reproduced with the permission of the copyright


Nikun06.indd 7 24.9.2008 16:41:07

CEPH Centre d’Étude du Polymorphisme Humain
cM centimorgan
CYP cytochrome P450
CYP2C19 cytochrome P450 2C19 gene
CYP2C9 cytochrome P450 2C9 gene
CYP2D6 cytochrome P450 2D6 gene
DMEs drug-metabolizing enzymes
HGDP Human Genome Diversity Project
HGP Human Genome Project
HVS Hypervariable segment of mtDNA
indel insertion/deletion
LD linkage disequilibrium
LNP lactase non-persistence
LP lactase persistence
LPH lactase-phlorizin hydrolase
MAF minor allele frequency
mtDNA mitochondrial DNA
NCBI National Center for Biotechnology Information
Ne effective population size
NRY non-recombining part of the Y chromosome
OMIM Online Mendelian Inheritance in Man
SNP single nucleotide polymorphism
STR short tandem repeat polymorphism
tagSNP tagging SNP


Nikun06.indd 8 24.9.2008 16:41:07

In this thesis, I have explored the ori- tal genetic diversity suggested that some
gins and distributions of genetic variation of the Finno-Ugric-speaking populations
among the Finno-Ugric-speaking human in North Eurasia have resided in the con-
populations living in remote areas of North tact zone of western and eastern Eurasian
Eurasia; it aims to disentangle the underly- gene pools. This fact, along with the re-
ing molecular and population genetic fac- duced uniparental and biparental genetic
tors which have shaped the genetic diver- diversity found, emphasize the complex
sity of these human populations. genetic background of these Finno-Ug-
To determine the genetic variation with- ric-speaking populations shaped by recur-
in and between these human populations I rent founder effects, admixture and genet-
have used mitochondrial, Y-chromosom- ic drift. Moreover, the high frequency of
al and autosomal genetic markers. In mi- lactase persistence T-13910 allele among the
tochondrial DNA analysis, we sequenced Finno-Ugric-speaking populations and the
the HVS-I and HVS-II parts of the hyper- haplotype background shaped by recent
variable control region along with phylo- positive selection suggests a local adap-
genetically informative SNP from the cod- tive response to a lactose rich diet in North
ing region of the mitochondrial genome. Eurasia. The Finno-Ugric-speaking Saami
Multiple STR and SNP markers were also show a significant difference in haplotype
genotyped from the non-recombining part structure and LD within the cytochrome
of the Y chromosome to assess the pater- P450 CYP2C and CYP2D gene subfam-
nal variation among the particular North ily region mainly due to genetic drift, al-
Eurasian populations. Moreover, multi- though the role of selection on these genes
ple SNPs were genotyped across the LCT, responsible for xenobiotic metabolism can
CYP2C and CYP2D gene regions for the not be excluded.
autosomal genetic diversity analysis of bio- Based on our observations, the Finno-
medical relevance. The obtained genotypes Ugric-speaking human populations show
were further analyzed using various popu- unique genetic features due to the complex
lation genetic methods. background of genetic diversity shaped by
Our results revealed unique patterns molecular and population genetic process-
of genetic diversity among the Finno- es and adaptation to remote areas of Boreal
Ugric-speaking populations. Uniparen- and Arctic North Eurasia.


Nikun06.indd 9 24.9.2008 16:41:07

1.1 INTRODUCTION populations (e.g. the Finns, Saami, Khanty
and Mansi) may hold genetic traces of ear-
Since the discovery of numerous polymor- ly Upper Paleolithic people, who first colo-
phic markers in the human genome (Land- nized the North Eurasian regions c. 12,000
steiner 1931) and a well established popu- BP (Cavalli-Sforza et al. 1994, Derbeneva
lation genetic theory pioneered by Haldane et al. 2002, Norio 2003b, Ross et al. 2006).
(1924), Fisher (1930) and Wright (1931), The Finno-Ugric speakers represent popu-
a great interest has focused on studies of lations, which can be clearly classified into
genetic variation in natural human popu- linguistic groups (Figure 1A) and which
lations (Collins 2003, Jobling et al. 2003, inhabit relatively large and geographical-
Kidd et al. 2004). Similarly, the recent ly remote areas in North Eurasia (Figure
publication of the entire map of the hu- 1B).
man genome along with the vast number It has been shown that the anatomically
of available genetic markers and new com- modern human (i.e. Homo sapiens) colo-
putational tools have enabled the analy- nized the ice-free Eurasia including some
sis of whole genomes (Lander et al. 2001, areas north of the Arctic Circle already
Venter et al. 2001, Collins et al. 2003, In- during the Upper Paleolithic (< 40,000 BP)
ternational Human Genome Sequencing (Pavlov et al. 2001, Vasil’ev et al. 2002,
Consortium 2004). The observed genom- Goebel et al. 1993). The Last Glacial Max-
ic diversity within and among human indi- imum (23,000–14,000 BP) unquestionably
viduals, groups and closely related species limited the northward spread of modern hu-
has challenged us to understand the com- mans, until a rapid global warming around
prehensive heritable variation in humans 12,000 BP initiated the melting of the con-
(Carroll 2003, Collins 2003, Kidd et al. tinental ice sheet revealing novel areas for
2004). How much does the variation have colonization (Hewitt 1999). Indeed, the
any functional significance? How rare or permanent arrival of modern humans into
common is a particular fraction of the vari- North Eurasia (Nordqvist 2000, Vasil’ev
ation? What is the distribution of the varia- et al. 2002, Bergman et al. 2004) is dated
tion within humans and what molecular or to the early Boreal period (< 12,000 BP),
population level factors caused the distri- although the geographical origin of these
bution of variation that we see today? In settlements is not entirely clear (Nordqvist
this thesis, I have examined the origin and 2000, Dolukhanov et al. 2002, Kuzmin and
distribution of genetic variation among the Keates 2004).
North Eurasian Finno-Ugric-speaking pop- The Finnish population is one of the ge-
ulations by using mitochondrial, Y-chromo- netically well-studied Finno-Ugric-speak-
somal and autosomal molecular markers. ing human population (Nevanlinna 1972,
Kere 2001, Norio 2003a; 2003b; 2003c).
A particular reason for this has been the
1.1.1 Genetic characteristics of the
Finnish Disease Heritage, a highly specific
Finno-Ugric-speaking populations
spectrum of more than 30 inherited, mostly
It has been suggested that some of the recessive diseases with high prevalence in
North Eurasian Finno-Ugric-speaking the Finnish population but rare or absent in


Nikun06.indd 10 24.9.2008 16:41:07

Figure 1. A) Human populations speaking Finno-Ugric languages belong to a specific branch of the
Uralic language family which is distinct from the Samoyed-speaking branch also within the Uralic
group (Greenberg 2000). The Finno-Ugric language group is further divided into four subclusters
within the Finnic group, i) Baltic languages of Finnish, Estonian and Karelian, ii) Saami languages,
iii) Volgaic languages of Erza, Moksha and Mari, iv) Permic languages of Komi and Udmurt, while
the Ugric group consists of the Khanty, Mansi and Hungarian languages (Abondolo 1998). B) A map
showing the geographic locations of the North Eurasian Finno-Ugric-speaking populations used in
this study (I–IV).

other populations (Perheentupa 1995, No- Z, U4 and U7; Sajantila et al. 1995, Meini-
rio 2003c). Finns are also considered an lä et al. 2001, Hedman et al. 2007). The Y-
ethnically more homogenous than several chromosome variation has revealed local
other European populations (Nevanlinna reduction in the genetic diversity (Sajan-
1972, Kere 2001). However, already based tila et al. 1996) and significant genetic dif-
on classical protein markers the Finnish ferences between Western and Eastern Fin-
population was shown to share genetic land (Kittles et al. 1998; 1999, Lahermo et
roots not only with the West European pop- al. 1999, Zerjal et al. 2001, Lappalainen et
ulations but also with more eastern popula- al. 2006, Palo et al. 2007). A clear eastern
tions (Nevanlinna 1972; 1984, Guglielmino component (haplogroup N3; Rootsi et al.
et al. 1990). These observations attracted 2007) in the Finnish Y-chromosome gene
studies of mitochondrial (mtDNA) and Y- pool (> 50%) has been observed (Zerjal
chromosomal markers to characterize the et al. 1997, Lappalainen et al. 2006). Re-
maternal and paternal Finnish gene pool, cent accumulation of the autosomal genetic
respectively (Vilkki et al. 1988, Pult et al. data has clarified Finns as part of the west-
1994, Sajantila et al. 1994; 1995; 1996, La- ern cluster of the Eurasian genetic land-
hermo et al. 1996; 1999, Zerjal et al. 1997; scape, although Finns are outliers among
2001, Kittles et al. 1998; 1999, Finnilä et the general European populations (Caval-
al. 2001, Meinilä et al. 2001, Raitio et al. li-Sforza et al. 1994, Kidd et al. 2004, Lao
2001, Hedman et al. 2007, Lappalainen et et al. 2008).
al. 2006, Palo et al. 2007). Mitochondrial The Saami is another relatively well-
studies have shown a clear western origin studied European Finno-Ugric speaking
and diversity of the Finnish gene pool, but ethnic group populating the northernmost
also minor traces (< 5%) of eastern gene parts of Norway, Sweden, Finland and Kola
flow have been observed (eg. haplogroup Peninsula of Russia (Ross et al. 2006). Sev-


Nikun06.indd 11 24.9.2008 16:41:07
eral studies have shown the Western Euro- populate mainly the Ob-river valley region
pean genetic affinity of the Saami people, in East Eurasia (Kolga et al. 2001, Der-
although their origin is still controversial beneva et al. 2002, Karafet et al. 2002).
(Cavalli-Sforza et al. 1994, Sajantila et al. The Mansi mtDNA variation has shown
1995, Tambets et al. 2004, Ross et al. 2006, a high frequency of Western Eurasian lin-
Ingman and Gyllensten. 2007, Johansson eages, which has been interpreted as a ge-
et al. 2008). However, it has been demon- netic continuum of the early Upper Paleo-
strated that Saami are genetically extreme lithic populations expanding from Near
outliers within Europe (Cavalli-Sforza et a. East/Southeast Europe to North Eurasia
1994), with strikingly low mtDNA diversity (Derbeneva et al. 2002). However, a clear
(Sajantila et al. 1995; Lahermo et al. 1996, East Eurasian derived mtDNA component
Tambets et al. 2004). The Saami mtDNA (~38%) is present in the Mansi (Derbeneva
lineages are mainly of Western Eurasian et al. 2002). Therefore, it is proposed that
origin while two East Eurasian lineages the present day Mansi may also contain
(i.e. haplogroup D5 and Z) indicate minor genetic traces of the early Upper Paleo-
(≤ 6%) Asian contribution as well (Tor- lithic people originating from Central Asia/
roni et al. 1998, Meinilä et al. 2001, Tam- South Siberia (Wells et al. 2001, Derbene-
bets et al. 2004, Ingman and Gyllensten va et al. 2002, Karafet et al. 2002, Deren-
2007). Similarly, the Y-chromosome varia- ko et al. 2007). Interestingly, the Y-chromo-
tion separates the Saami from other North some variation has shown some specificity
Eurasian populations, and shows low ge- (i.e. N2 lineage) among the Siberian popu-
netic diversity (Sajantila et al. 1996, Laher- lations speaking Uralic languages, includ-
mo et al. 1999, Tambets et al. 2004). Two ing the Khanty (Karafet et al. 2002). How-
West Eurasian lineages (i.e. haplogroup I ever, these Uralic-speaking populations as
and R1a) together and one East Eurasian a whole are not characterized by a discrete
N3 lineage account for ~40% of the Saa- set of founder Y-chromosome lineages
mi Y chromosomes, respectively (Wells et (Karafet et al. 2002). The uniparental ge-
al. 2001, Tambets et al. 2004). Uniparen- netic composition of the Khanty and Mansi
tal and autosomal marker studies suggest has been interpreted to represent a recur-
that the origin of the Saami gene pool is rent amalgamation of small Eurasian popu-
an admixture of Western and Eastern Eur- lation groups along with the spread of hu-
asian genetic components (Cavalli-Sforza mans into North Eurasia (Derbeneva et al.
et al. 1994, Tambets et al. 2004, Ross et 2002, Karafet et al. 2002).
al. 2006, Ingman and Gyllensten 2007, Jo- The genetic diversity of the other North
hansson et al. 2008). Based on the unique Eurasian Finno-Ugric-speaking popula-
genetic diversity of Saami it is interpreted tions (i.e. Estonian, Karelian, Erza, Moksha,
that the Saami population has remained in Mari, Komi and Udmurt) has been studied
small and constant size since its origin (Sa- less comprehensively. The main interpre-
jantila et al. 1995, Laan and Pääbo 1997, tation among these Finno-Ugric-speaking
Ross et al. 2006). populations is that they have closer genet-
The Ugric-speaking Khanty and Mansi ic affinities with each other than with the
ethnic groups originate from a common non-Finno-Ugric speakers (except the Lat-
Ob-Ugric population on the western side vians and Lithuanians) living in North Eur-
of the Ural mountains (Kolga et al. 2001, asia (Zerjal et al. 1997; 2001, Khusnutdi-
Derbeneva et al. 2002). Currently, they nova et al. 1999, Rosser et al. 2000, Raitio


Nikun06.indd 12 24.9.2008 16:41:08
et al. 2001, Derbeneva et al. 2002, Karafet in 1972, Barbujani et al. 1997, Jorde et al.
et al. 2002, Laitinen et al. 2002, Kutuev et 2000, Romualdi et al. 2002, Rosenberg et
al. 2006, Tambets et al. 2004, Rootsi et al. al. 2002). These results clearly show that
2007). human races do not have any biological ba-
sis and should rather be considered with-
in a complex historical and social context
1.1.2 Sampling in human genetic studies
(Lewontin 1972, Collins et al. 2003).
and the concept of a population
Due to the sensitive issues concerning
The sampling of individuals and the cri- human population genetic studies, ethi-
teria for defining a population are funda- cal guidelines have been enforced since
mental issues in genetic studies of natural the reports of World Health Organization
populations (Waples and Gaggiotti 2006). (1964), World Medical Association (1964)
In biological terms a population is defined and North American Regional Commit-
as a group of organisms of the same spe- tee of the Human Genome Diversity Proj-
cies that interbreed and occupy a particu- ect (1997) (reviewed by Greely 2001b). In
lar space at a particular time (Krebs 1994). general, these guidelines aim to inform in-
In practice, however, population boundar- dividuals and groups of people who want to
ies are often notoriously difficult to define, participate actively in such population ge-
and humans make no exception. It is thus netic studies, having access to the genetic
apparent that there is no single consensus data created, and also having a possibility
to define a population but instead it de- to influence the concept of the study. Cur-
pends on the context and objectives of the rently, the most important ethical require-
study (reviewed by Waples and Gaggiotti ment for human population genetic studies
2006). In practical terms, one definition of is an informed consent from each individ-
a population in humans is often a group of ual sampled and, if possible, a group con-
individuals that can be clustered according sent obtained from the appropriate cultural
to some shared social or physical charac- authorities of the particular population or
teristic; e.g. geography, ethnicity, linguis- ethnic group (Greely 2001b). In addition,
tic affiliation, culture, subsistence pattern guidelines for giving scientific feedback
and self-identity are often taken as proxies and for explaining the obtained results to
to define a population (Jobling et al. 2003). the volunteers of the particular study have
Consequently, several important ethical been proposed (Greely 2001b).
issues associated with genetic variation,
ethnicity and race have been brought up
(Greely 2001b, Tishkoff and Kidd 2004). 1.2 ORGANIZATION OF THE
Especially the indigenous peoples have HUMAN GENOME
been concerned about the consequences as
genetic studies of human populations are
1.2.1 General structure
of potential social and legal impact (Gree-
ly 2001ab, Collins et al. 2003, Jobling et A single haploid human genome is estimat-
al. 2003). It is important to note that in hu- ed to contain about 3.2 billion nucleotides
mans racial definitions are not based on bi- with an average of 22,000 genes (Lander
ology. Up to 95% of the genetic variation et al. 2001, International Human Sequenc-
in humans resides within populations and ing Consortium 2004, Jobling et al. 2003).
only 5–13% between populations (Lewont- Less than 2% of the human genome is as-


Nikun06.indd 13 24.9.2008 16:41:08
sumed to encode proteins (Lander et al. and Cavalleri 2005, International HapMap
2001, International Human Sequencing Consortium 2005; 2007). However, even
Consortium 2004). This means that the 0.1% difference between two human ge-
number of genes in humans is greatly less nomes denotes over 3 million differing bas-
than earlier estimates of 100,000 and even es, and around 12 million possible variants
less than identified in the nematode worm (Kruglyak and Nickerson 2001, Kidd et al.
Caenorhabditis elegans (23,000), the fruit 2004). This variation is enough to ensure a
fly Drosophila melanogaster (26,000) and unique genome for every human individu-
the rice Oryza sativa (45,000). Moreover, al (Kidd et al. 2004).
the non-coding repeat elements showed
to comprise more than 50% of the human
1.2.2 Variation in the Human Genome
genome, outnumbering that seen in the
Caenorhabditis elegans (7%) and in the Genome variation occurs in various forms,
Drosophila melanogaster (3%) (Lander et such as single-base substitutions (single
al. 2001). Human and chimpanzee genom- nucleotide polymorpisms, SNPs), inser-
es only diverge by 1.2%, and human and tions or deletions of tandem (short tandem
Neanderthal genomes are estimated to di- repeats, STRs; variable number of tandem
verge only by 0.5% (Chimpanzee sequenc- repeats, VNTRs) or dispersed elements
ing and analysis consortium 2005, Green (short interspersed elements, SINEs; long
et al. 2006). Moreover, compared to the interspersed elements, LINEs), rearrange-
great apes, humans as a species show low ments (copy number variants, CNVs) and
genetic diversity and low genetic structure, other more complex variables (Figure 2;
which indicates a demographic bottleneck Denoeud et al. 2003, Watkins et al. 2003,
in the early history of humans (Kaessmann Jobling et al. 2003, Kidd et al. 2004, Re-
and Pääbo 2002). Consequently, human don et al. 2006).
genomes are between 99.5–99.9% similar These genomic variations can also be
to each other (Kidd et al. 2004, Goldstein categorized into short (< 10bp), interme-

– –


Figure 2. Classes of typical human genome variants.


Nikun06.indd 14 24.9.2008 16:41:08
diate (10bp–10kb) or long size (1kb–1Mb) 1–2kb in which the recombination rate is
genomic variants (Figure 2; Jobling et al. ten or more times higher than in surround-
2003). For practical purposes, a variant in ing regions, although regions up to sever-
a DNA sequence is defined as a polymor- al Mb have also been observed with high
phism when at least two alleles are present recombination rates (Arnheim et al. 2003).
in a population, and the frequency of the The overall existence of the hotspots has
minor allele is MAF ≥ 0.01. The polymor- been confirmed by larger analysis estimat-
phisms may be within coding, non-coding ing between 30,000 and 50,000 hotspots
or regulatory regions. Within a coding re- along the entire genome (Myers et al.
gion a polymorphism may alter the amino- 2006). However, very little is known about
acid composition or terminate the transla- the underlying biological mechanisms cre-
tion of the corresponding protein, while ating hotspots (reviewed by Arnheim et al.
variation within a regulatory region may 2003). Based on allele-specific hotspots, it
change the level of expression of the partic- is suggested that the main factor causing
ular protein. As the majority of the human hotspots is the distribution of recombina-
genome is non-coding, most of the vari- tion initiation sites along the genome (Jef-
ants do not affect the amino acids. How- freys and Neumann 2002, Arnheim et al.
ever, there is a fraction of polymorphisms 2003). However, most hotspots have been
that do affect the amino-acid composition shown to lack these motifs, indicating mul-
(0.7% of SNPs, dbSNP128) or gene ex- tiple causes for the recombination hotspots
pression (1.5% of SNPs, dbSNP128). The in the human genome (Myers et al. 2005).
variants created by mutation can also be re-
arranged by recombination, which further
1.2.3 Population processes shaping
increases diversity. Both non-homologous
genetic variation
recombination between chromosomes and
homologous recombination within a chro- Genetic diversity created by molecular
mosome are fundamental genomic process- mechanisms (i.e. mutation and recombina-
es exchanging genetic material and increas- tion) is further modified by population-lev-
ing the genetic variation initially caused by el processes of genetic drift, migration and
mutations (Nachman 2001). The rate of re- natural selection (Jobling et al. 2003, Kidd
combination is typically estimated in centi- et al. 2004). These evolutionary forces
morgans (cM), which describe the genetic may affect the variation differently within
distance in units of recombination frequen- and among the populations or genomic re-
cy (1cM = 1% recombination). The whole gions. Hence, although humans are geneti-
genome average recombination rate is cally ~99.9% identical, evolutionary mech-
1cM/Mb. However, the recombination rate anisms have created a substantial amount
is shown to vary extensively along the ge- of genetic variation, observed 95% with-
nome from 0.1cM/Mb to more than 3cM/ in and 5–13% among human populations
Mb (Kong et al. 2002). More important- (Kidd et al. 2004, McVean et al. 2005).
ly, Jeffreys et al. (2001) estimated that re- In a given population, each generation
combination rates are organized into sharp represents a finite sample from the previ-
local peaks and valleys (i.e. hotspots and ous generation. This random sampling of
coldspots, respectively) along the genome. gametes known as genetic drift changes
These recombination hotspots are defined allele and haplotype frequencies between
as small fractions of genome between generations until the variant becomes ei-


Nikun06.indd 15 24.9.2008 16:41:08
ther fixed or lost (Wright 1931). Under neu- phenotypes in succeeding generations. In
tral evolution, the number of new alleles other words, individuals with allele com-
generated by mutation is mainly shaped by binations better adapted to the prevailing
random genetic drift (Kimura 1968, Ohta conditions are more likely to have higher
2002). Drift is modeled through the ideal chances to survive and reproduce. Alleles
population model and measured as the ef- that reduce the survival are subject to neg-
fective population size (Ne). In this context, ative (i.e. purifying) selection and reduce
Ne is the size of an idealized Wright-Fish- in frequency, while variants that increase
er population, i.e. population with infinite survival undergo positive selection and in-
size, equal sex ratio, non-overlapping gen- crease in frequency. Moreover, other se-
erations and random mating that experienc- lective forces, such as balancing selection,
es the same amount of genetic drift as the may prefer heterozygote loci or maintain
one under study (Wright 1931). The small- alleles at low frequency, creating high ge-
er the Ne the greater the genetic drift and netic diversity, as observed at the HLA loci
vice versa. Therefore, reductions in popu- responsible for immune response (Beck
lation size (i.e. bottleneck/ founder effect) and Trowsdale 2000, Jobling et al. 2003).
increasing genetic drift can dramatically Recently, several studies have used molec-
change the allele frequencies in popula- ular data to estimate the departures of al-
tions. An example of genetic drift in hu- lele frequency distributions from neutral
mans is the reduced genetic diversity of expectations and thus to detect natural se-
modern human populations whose ances- lection in humans (Akey et al. 2002, Busta-
tors migrated out of Africa and experienced mante et al. 2005, Sabeti et al. 2006; 2007,
a bottleneck, and lower effective popula- Wang et al. 2006, Williamson et al. 2007).
tion sizes compared to most sub-Saharan Based on current observations, it is evident
populations (Cann et al. 1987, Armour et that selection has a strong role in shaping
al. 1996, Reich et al. 2001). human genetic variation, although the rel-
Migration and subsequent gene flow ative contribution of the positive, negative
homogenizes allele frequencies between and balancing selection to human genetic
populations, which reduces the effect of variation is still unclear (Kelley et al. 2006,
random genetic drift (Jobling et al. 2003). Kryukov et al. 2007, Nielsen et al. 2007).
The extreme scenario of a gene flow is an It is estimated that the majority of the nat-
admixture of two populations into an ad- ural selection acting on genomes is of neg-
mixed population. The extent of admixture ative selection removing new deleterious
is often inferred using a predefined set of mutations (Nielsen et al. 2007). But most
parental populations from which the ad- of the genetic surveys have focused on de-
mixed population is assumed to have de- tecting positive selection to disentangle the
rived (Bertorelle and Excoffier 1998). In- molecular-level traces of evolutionary ad-
deed, estimates of the global admixture of aptations and subsequent factors relating
human populations inferred purely from humans to their environment (Nielsen et
the genetic structure have also been report- al. 2007).
ed (Pritchard et al. 2000, Rosenberg et al. Even a weak selective benefit might
2002). cause substantial changes in allele frequen-
Natural selection, originally set by Dar- cies over generations. However, in natural
win (1859) and later refined by Fisher populations different evolutionary forc-
(1930), defines the differential survival of es overlap with one another. Consequent-


Nikun06.indd 16 24.9.2008 16:41:08
ly, genetic drift affects the same alleles which more than 6.6 million are validat-
subjected to natural selection, and thus it ed and 7.5 million are found within genes
may be difficult to identify loci subject to (Build 129, April 2008). Therefore, SNPs
natural selection (Kelley et al. 2006). It are well-suited for several genetic and evo-
is known that in populations of small Ne lutionary studies including candidate gene
stronger selection is needed to influence or causal variant mapping, for assessing
allele frequencies, whereas allele frequen- genetic diversity and divergence, and re-
cies of larger populations might be shaped cently for disentangling whole genome as-
by weaker selective forces (Nielsen et al. sociations of complex traits and risk factors
2007). such as cancer, diabetes and vascular dis-
eases (Collins et al. 2004, Kidd et al. 2004,
McVean et al. 2005).
1.2.4 Visualizing genomic variation
STRs (i.e. microsatellites) are tandem
repeat markers of 1–6 nucleotides (e.g. … Molecular markers CACACA… dinucleotide repeats) which
used in human genetic studies are the most variable types of DNA se-
DNA sequencing is the ultimate tool for quences in humans (Weber 1990). The STR
detecting all different genetic variants in loci have typically a high number of dif-
any particular genomic region (Sanger et ferent alleles per locus, each with different
al. 1977), but the method is often techni- number of repeat motifs (reviewed by El-
cally limited and more expensive to con- legren 2004). Moreover, STRs are found in
duct than genotyping individual SNP or all chromosomes, and with a high repeat
STR loci (Mir and Southern 2000, Syvän- number variability between individuals.
en 2001). Moreover, the vast number of Most of the known STRs are most likely
identified SNP and STR loci in humans neutral, although some are involved in hu-
and new high-throughput methods enable man diseases. In the disease causing mic-
efficient and simultaneous genotyping of rosatellites, the causal factor is often the
these markers (Mir and Southern 2000, increase of the repeat number over some
Syvänen 2001, Collins et al. 2003). threshold level (e.g. > 30 and up to 2000
It is estimated that SNP markers ac- repeats in myotonic dystrophy; Mahade-
count for most of all human genetic vari- van et al. 1992). Estimations using hu-
ation, and are also likely to play a crucial man pedigree and sperm sample analysis
role in how humans respond to exogenous have shown that the STR mutation rate
pathogens, chemicals, drugs and other ther- is between 10-3–10-4 per locus per gener-
apies (Collins et al. 2004, Kidd et al. 2004, ation (Heyer et al. 1997, Brinkmann et al.
International Human Genome Sequencing 1998, Sajantila et al. 1999, Kayser et al.
Consortium 2004, International HapMap 2000, Xu et al. 2000). The STR mutation
Consortium 2007, McVean et al. 2005). is mainly modeled by the stepwise muta-
SNPs are mostly biallelic with an estimat- tion model (SMM), which postulates that
ed average mutation rate of 2.3 × 10-8 per the mutations, i.e. gain or loss of one re-
site per generation (Nachman and Crowell peat, occur at fixed rate independent of re-
2000). On average, there is one SNP with- peat length (Ohta and Kimura 1973). How-
in every 200bp along the human genome, ever, this is not totally true as the rate and
as currently there are more than 18 million direction of the STR mutations are shown
SNPs listed in the human genome, from to be length-dependent (reviewed by El-


Nikun06.indd 17 24.9.2008 16:41:08
legren 2004). Microsatellites, due to their
informativeness, Mendelian inheritance
and relative ease of genotyping, have prov-
en extremely powerful for linkage analysis
of Mendelian disorders (Jorde et al. 1997,
Zhivotovsky et al. 2003) and for studies
of evolution and population genetic struc-
ture (Rosenberg et al. 2002, Ellegren et al.
2004) as well as for genetic identification
and paternity testing relevant in forensic
medicine (Jobling et al. 1997).
Due to the high number of SNP and STR
markers identified in the human genome,
some of the markers in the same chromo-
some are close to each other. Often alleles
of markers in close physical proximity are
passed on from parents to offspring togeth- Figure 3. Haplotype phase estimation between
er. These allele combinations, called haplo- two diploid loci of a) homozygotes, b) homozy-
gote and heterozygote, and c) heterozygotes.
types, either directly from haploid mtDNA Arrows denote the two diploid loci and H1/H2
and Y-chromosome loci or from diploid denote the deduced haplotypes.
chromosomes can be considered as single
alleles from a gamete. Haplotypes, com-
pared to single markers, provide a greater lotype estimations. Molecular methods in-
statistical power for analysis and reduce clude amplification of a single cell genome
the sample size needed for analysis of sig- (Ruano et al. 1990), allele specific ampli-
nificant association (Clark 2004). However, fication (Michalatos-Beloin et al. 1996) or
the deduction of haplotypes from diploid construction of mouse-human hybrid cells
marker data is not always straightforward. with haploid human genome (Patil et al.
To illustrate the problem we may assume 2001). Statistical approaches are based on
three different cases (Figure 3), where two the assumption of a common ancestor ho-
SNPs in a same chromosome a) are ho- mozygous at all sites. The steps from the
mozygous, b) only one SNP is heterozy- observed variation to this common ances-
gous and c) both SNPs are heterozygous. tor within a population are then estimated
Homozygous SNPs will produce two iden- (Clark 1990). These statistical methods
tical haplotypes, whereas two different hap- have shown to be powerful for accurate
lotypes are observed when only one site is estimation of the haplotypes from diploid
heterozygous. However, in the case of two genotypes (Excoffier and Slatkin 1995,
heterozygous SNPs, the allele combina- Stephens et al. 2001).
tions are often shuffled by recombination
making it more difficult to determine the
true haplotypes. To overcome these limi- Special characteristics
tations, genealogical, molecular and sta- of the uniparental markers
tistical methods have been used. In princi- In human cells, there are hundreds to thou-
ple, using pedigree data from parents and sands copies of cytoplasmic organelles
grandparents often enables accurate hap- called mitochondria. Each mitochondria


Nikun06.indd 18 24.9.2008 16:41:08
contain at least a single copy of mitochon- Only two pseudoautosomal telomeric seg-
drial DNA (mtDNA), organized in a small ments recombine with the X chromosome,
(~16.6kb) circular double-stranded DNA but these amount to less than 5% of the to-
molecule, which is transmitted without re- tal length of the chromosome (Jobling and
combination only through the mother. The Tyler-Smith 2003). The NRY part of the Y-
mtDNA genome contains 37 genes and a chromosome is extremely gene poor, cod-
non-coding control region including three ing for only 27 proteins but enriched with
known hypervariable segments (HVS- many types of DNA repeats and variants.
I, HVS-II and HVS-III) (Anderson et al. To date, more than 200 binary polymor-
1981, Andrews et al. 1999, Bandelt et al. phisms (i.e. SNPs), over 200 microsatellites
2006). The mtDNA has a much greater av- and several other repeat polymorphisms
erage mutation rate (3.4 × 10-7 – 3.6 × 10–6) within the NRY have been characterized
than nuclear genome (2.5 × 10-8) (Ingman (Jobling and Tyler-Smith 2003, Kayser et
et al. 2000, Nachman and Crowell 2000, al. 2004). These two marker classes have
Richards et al. 2000). As a haploid non- differing mutation rates; the slow evolv-
recombining molecule the variation with- ing SNP markers allow construction of
in mtDNA results only from the accumula- common Y-chromosome clusters (i.e. hap-
tion of mutations. The mtDNA haplotypes logroups) and their phylogeny, whereas
are often further clustered into mtDNA lin- STRs within these haplogroups enable a
eages or haplogroups (Torroni et al. 2006), more detailed haplotype resolution (Knijff
which possess a molecular record of the 2000, YCC 2002). The differing resolution
maternal genealogical history. In humans, obtained with these markers has allowed a
analysis of mitochondrial genomic diversi- detailed phylogeographic analysis of the
ty and phylogeography, an analysis of geo- human male populations (Underhill et al.
graphical distribution of the variation, were 2000, Wells et al. 2001, also reviewed by
initially studied merely with the HVS-I Jobling and Tyler-Smith 2003).
(360bp) and HVS-II (268bp) regions and Both mtDNA and Y chromosome loci
more recently using complete mtDNA ge- possess only one quarter of an effective
nomes. The mtDNA has proven powerful population size compared to the nuclear
for assessing both the micro-geographic fe- DNA. Therefore the genetic diversity and
male population histories and reconstruct- phylogenetic structure of uniparental loci
ing broader prehistoric human dispersal are more sensitive to changes in the de-
(Cann et al. 1987, Ingman et al. 2000, Tor- mography (e.g. bottlenecks) and generally
roni et al. 2006). In addition, the high copy show greater genetic differences between
number of this small circular genome has different groups or populations. All these
enabled several successful ancient DNA evolutionary characteristics combined
analyses (Pääbo 1989, Cooper and Poinar with the well-defined phylogeny and uni-
2000, Pimenoff and Korpisaari 2004). fied nomenclature system (YCC 2002, Tor-
Y chromosome, the sex-determining roni et al. 2006), make uniparental markers
male specific locus of the human genome ideal tools for investigating the recent hu-
and a uniparental haploid counterpart of man evolution, with additional important
the mtDNA, consists mainly of non-recom- applications in medical and forensic genet-
bining DNA (NRY, 57Mb–60Mb), which ics (Jobling et al. 1997, Howell et al. 2003,
is transmitted only from father to male Jobling and Tyler-Smith 2003).
offspring (Jobling and Tyler-Smith 2003).


Nikun06.indd 19 24.9.2008 16:41:09
1.2.5 Linkage Disequilibrium li 2003, Wall and Pritchard 2003, Slatkin
2008). Mutations generally create LD, but Processes shaping LD locus with a high mutation rate or a high
Linkage disequilibrium (LD) defined as a number of alleles (e.g. STRs) tend to erode
non-random association of alleles at linked LD (Ardlie et al. 2002), although recom-
loci may be broken by recombination dur- bination and gene conversion are the main
ing meiosis. Alleles at loci lying in close molecular factors reducing LD. The popu-
proximity in a chromatid recombine less lation genetic factors (drift, migration and
frequently than those far apart and are selection) have more diverse effects on LD.
more likely to be in LD. Thus, a new mu- Consequently, the increased drift of small
tation arising in the genome is initially in populations tends to increase LD as haplo-
complete LD with the adjacent marker al- types are lost from the population (Terwil-
leles, which is indicated by only three of liger et al. 1998). Thus, small isolate popu-
the four possible haplotypes between the lations have been shown to possess higher
two loci within a population (Figure 4, re- extended LD compared to populations of a
viewed by Ardlie et al. 2002). larger size (reviewed in Ardlie et al. 2002).
The most commonly used measures of It is noteworthy that inbreeding as a phe-
pairwise linkage disequilibrium are |D’| nomenon inseparable from drift can pro-
(Lewontin 1964) and r2 (Hill and Weir duce similar results of reduced heterozy-
1994), which both vary between complete gosity as random genetic drift in small
(1.0) and no (0.0) association between loci. populations (Slatkin et al. 2008). There-
Moreover, a recently developed Bayesian fore, the combined effect of drift and in-
method to estimate the population recom- breeding in some human population have
bination parameter ␳ = 4Ner from genotypes been proposed to have caused the extend-
has proven to be an efficient way to quanti- ed tracts of genomic homozygosity (Gib-
fy LD differences between populations (Li son et al. 2006). Interestingly, gene flow
and Stephens 2003, Crawford et al. 2004, between populations also enhances the LD.
Evans and Cardon 2005). However, there Immediately after two populations have
is no consensus on what is the best statis- admixed, LD is proportional to the allele
tic for LD as LD measures are known for frequency differences between the parental
several stochastic limitations e.g. differ- populations and not related to the distances
ing sensibility to sampling and population between markers. In the following genera-
genetic processes, although the ␳ estimate
seems to be more robust to fluctuations
of ascertainment bias and marker density
than the pairwise LD estimates (Lewontin
1988, Weiss and Clark 2002, Phillips et a.
2003, Evans and Cardon 2005).
The strength of LD between a pair of
markers depends on both molecular and
population genetic factors, which have
shown to generate varying patterns of LD
across the human genome and populations
(Ardlie et al. 2002, Wang et al. 2002, Ber- Figure 4. A new mutation G* associated with A
tranpetit et al. 2003, Tishkoff and Verel- allele at nearby locus.


Nikun06.indd 20 24.9.2008 16:41:09
tions, this artificial LD between unlinked would be successful for mapping most of
markers fades, while LD between nearby the common genomic variation (Carlson
markers is often slowly dissipated by re- et al. 2004). The structure and distribu-
combination. Strong positive selection in- tion of LD blocks along the genome has
creases the frequency of an advantageous been shown to be shared by diverse human
allele but also alleles closely linked to it, populations and would indicate a com-
creating unusually strong LD between the mon feature in the human genome (Daly
causal and neutral alleles. This phenome- et al. 2001, Gabriel et al. 2002). But the
non is called genetic hitch-hiking (Smith quest for common haplotypes of the hu-
and Haigh 1974), in which an entire seg- man genome has shown to be more diffi-
ment of DNA (i.e. haplotype) flanking an cult a task than expected with no clear cur-
advantageous variant can rapidly rise to rent consensus (Daly et al. 20001, Gabriel
high frequency or even fixation. The se- et al. 2002, Zhang et al. 2002, Phillips et al.
lective sweeps have shown significantly 2003, Stumpf and Goldstein 2003, Ding et
elevated LD and reduce heterozygosity al. 2005, Zeggini et al. 2005, Internation-
among the closely linked neutral markers al HapMap Consortium 2007). However,
within particular regions (Smith and Haigh some common features can be deduced in
1974, Kim and Stephan 2002, Nielsen et agreement with most LD studies. The sub-
al. 2005). Similarly, although the overall Saharan African populations tend to have
effect is generally not so strong, negative shorter LD blocks compared to non-Afri-
selection against a deleterious variant may can populations. This is explained by the
increase LD as the deleterious haplotypes interplay of more recent recombination
are deleted from the population. and a bottleneck leading to genetic drift
experienced by modern humans since the
expansion out of Africa as opposed to the Block structured genome, present-day sub-Saharan Africans (Tish-
tagSNPs and LD mapping koff et al. 1996, Jorde 2000, Gabriel et al.
Based on simulation studies it was as- 2002, Wall and Pritchard 2003, Conrad et
sumed that genomic LD rarely extends al. 2006). Moreover, in a whole genome
over 3kb (Kruglyak 1999). However, re- analysis Hinds et al. (2005) estimated that
cent studies have shown that a great frac- non-African and African-American pop-
tion of LD in the human genome is or- ulations have around 95,000 and 236,000
ganized into discrete sets of loci of low LD blocks with an average block size of
haplotype diversity and high LD between 23.0kb and 8.8kb, respectively. Therefore,
markers (i.e. haplotype or LD blocks) sep- it was proposed that further studies with
arated by short regions (1–2kb) of intense larger sets of human populations are need-
hotspots of recombination (Jeffreys et al. ed to establish more reliable definitions of
2000, Jeffreys et al. 2001, Daly et al. 2001, the block boundaries along the human ge-
Gabriel et al. 2002, Goldstein 2001, Patil nome.
et al. 2001, May et al. 2002). This led to Regardless of the block criteria, certain
the hypothesis that most of the human ge- SNPs along the genome show complete LD
nome has a block-like structure with an av- with each other even with longer distances
erage LD block between few kb and 100kb (> 5kb) (Johnson et al. 2001). These tight-
(Wall and Pritchard 2001). Hence, it was ly correlating SNPs are often called haplo-
proposed that only few SNPs at each block type tagging SNPs (tagSNPs) as it is shown


Nikun06.indd 21 24.9.2008 16:41:09
that typing a few such tagSNPs allows to associated alleles in parental populations
predict most other variants within the same (Chakraborty and Weiss 1988). Based on
LD block (Johnson et al. 2001). Currently these observations and unique demograph-
there are several approaches with congru- ic histories of the Finns and Saami, these
ent results to identify tagSNPs (Chi et al. populations have often shown markedly
2006). These include: i) the identification higher levels of extended LD compared to
of LD blocks within the genomic region of other European populations (Varilo et al.
interest, ii) the estimation of pairwise LD 1996; 2000; 2003, Laan et al. 1997; 2005,
values within the LD block, and iii) the se- Kaessmann et al. 2002, May et al. 2002,
lection of a few SNPs that capture most of Kauppi 2003, Johansson et al. 2005; 2007,
the variation within the LD block (Carlson Service et al. 2006). In this context, LD
et al. 2004). However, alternative methods has been successfully used for mapping
to define tagSNPs without LD block cri- monogenic diseases prevalent in the Finn-
teria are also currently used (Halldorsson ish population (Hästbacka et al. 1992, de
et al. 2004). More importantly, it has been la Chapelle and Wright 1998, Peltonen et
shown that tagSNPs are often well-trans- al. 2000). The Saami have also been pro-
ferable across populations at least within posed as a promising target population for
continental regions (Gonzalez-Neira et al. LD drift mapping of complex traits (Ter-
2006, Mueller et al. 2005). williger et al. 1998, Kaessmann et al. 2002,
In practise, if a marker (e.g. tagSNP) Ross et al. 2006).
is in LD with a disease-causing allele, the
strength of LD between the marker and the
disease variants can be used to predict the 1.3. HUMAN GENOME DIVERSITY
causal allele (Johnson et al. 2001). This AND THE HAPMAP PROJECT
population-based LD mapping rests on the
assumption that the disease causing muta- Shortly after the announcement of the Hu-
tion stays linked with markers in its phys- man Genome Project (HGP), Cavalli-Sfor-
ical vicinity for a certain amount of time za et al. (1991) proposed for a worldwide
due to the slower decay of LD with tight- survey of the human genome variation
ly linked markers (Lewontin and Kojima known as the Human Genome Diversity
1960, reviewed by Slatkin 2008). Moreover, Project (HGDP). The aim of this project
recent observations have led to the hypoth- was to disentangle the structure and distri-
esis that populations of small and constant bution of the genetic diversity in humans.
size are ideal for LD mapping due to the Despite the difficulties in ethical issues
drift-enhanced disease and allele frequen- and criticism from scientists and indig-
cy differences within a population between enous people (Greely 2001a), the HGDP
the case and control samples (Terwilliger successfully collected and announced a
et al. 1998). Similarly, admixture may of- worldwide sample set of 1064 individu-
fer another efficient approach for LD-map- als representing 52 populations from all
ping using hybrid populations compared continents (HGDP CEPH cell line pan-
to non-admixed populations (Chakraborty el, Cann et al. 2002). These samples have
and Weiss 1988). However, the success since been used in a number of population
of the admixture mapping depends heav- genetic studies and the results are continu-
ily on the time since the admixture and the ously collected into a publicly available da-
frequency differences of the disease and tabase (Cavalli-Sforza 2005).


Nikun06.indd 22 24.9.2008 16:41:09
The discovery of the punctuate LD sult the HapMap has characterized more
along the human genome (Ardlie et al. than 4 million SNPs along the human ge-
2002, Gabriel et al. 2002) combined with nome, and recently completed phase II has
the previous hypothesis of common dis- identified additional 6 million SNPs (The
ease/ common variant (Lander 1996, Reich International HapMap Consortium 2005;
and Lander 2001) and the available high- 2007). Currently the project has been im-
throughput genotyping methods boosted proved by the addition of more populations
the foundation of the International Hap- (HapMap phase 3 data,
Map Project (The International HapMap Moreover, numerous fine-scale genom-
Consortium 2003). The primary aims of ic analysis and genome-wide association
the HapMap were i) to discover new ascer- studies have benefitted from HapMap data
tained SNPs across human genome, ii) to (Deloukas and Bentley 2004, McVean et
characterize a genome-wide set of SNPs al. 2005). Despite recent criticism (Terwil-
validated in four human populations and liger and Hiekkalinna 2006), the HapMap
iii) to produce a common haplotype map project has already strongly contributed
of the entire human genome using 269 to our quest for understanding the signifi-
DNA samples from four ethnic human cance of the heritable genetic variation in
groups (i.e. of African, European, Japa- modern humans and to disentangle the ge-
nese and Han Chinese origin). The pri- netic variants relevant in complex traits of
mary use of the common haplotype map human health and disease (Deloukas and
is in whole genome association studies of Bentley 2004, McVean et al. 2005, The In-
complex traits (The International HapMap ternational HapMap Consortium 2007).
Consortium 2003). So far, as a phase I re-


Nikun06.indd 23 24.9.2008 16:41:09

In this thesis and the articles within I have explored the underlying molecular and pop-
ulation genetic factors and processes shaping genetic variation. The main focus of this
thesis has been the Finno-Ugric-speaking populations living in remote and relatively ex-
treme geographic locations in North Eurasia.

Specifically I have focused on the following themes:

1) To study the genetic history and diversity of the Finno-Ugric-speaking populations by
using uniparental markers (I, II).

2) To determine the prevalence and haplotype background of lactase persistence variant
C/T-13910 in North Eurasian populations (III)

3) To assess the recombination rate variation, haplotype structure and LD pattern within
clinically significant cytochrome P450 CYP2C and CYP2D gene subfamily regions in
European populations including the North Eurasian Finno-Ugric-speaking Saami and
Finns (IV)


Nikun06.indd 24 24.9.2008 16:41:09


DNA samples consisted in total of 3119 To study the maternal neutral genetic di-
healthy unrelated individuals of 53 human versity and evolutionary relationships of
populations with informed consent. More- different North Eurasian human popula-
over, a total of 5697 reference samples of tions, we assessed the mtDNA HVS-I and
42 Eurasian populations were obtained HVS-II region sequences between posi-
from the literature. All these samples were tions 16024–16383 and 72–340, respective-
used in the analysis but with differing sets ly. In addition, we analyzed seven mtDNA
as described in the original publications (I– coding region SNP markers to confirm the
IV). It is noteworthy that our main interest observed mtDNA control region lineages
concentrates on the North Eurasian Finno- (II). To assess the paternal neutral genet-
Ugric-speaking population shown in detail ic diversity and dispersal among the North
in Table 1 and also described in Pimenoff Eurasian populations, we used 17 Y-chro-
and Sajantila (2002). mosome-specific SNP markers describing

Table 1. Finno-Ugric-speaking populations used in each study (I-IV)

Population na Linguistic Geographic Subsistence Population References
affiliation affiliation size b within

Finns 400 Finnic Northeast Agriculture 5,000,000 I, II,III,IV
(Finno-Ugric) Europe

Saami 114 Finnic Northeast Reindeer 80,000 II,III,IV
(Finno-Ugric) Europe breeding

Estonians 28 Finnic Northeast Agriculture 1,300,000 II
(Finno-Ugric) Europe

Karelians 83 Finnic Northeast Agriculture 140,000 II
(Finno-Ugric) Europe

Moksha 30 Volgaic Northeast Agriculture 380,000 II,III
(Finno-Ugric) Europe

Erza 30 Volgaic Northeast Agriculture 760,000 II,III
(Finno-Ugric) Europe

Udmurt 30 Permic Northeast Agriculture 640,000 II,III
(Finno-Ugric) Europe

Komi 28 Permic Northeast Agriculture 340,000 II,III
(Finno-Ugric) Europe

Khanty 106 Ugric Northwest Reindeer 21,000 II,III
(Finno-Ugric) Siberia breeding

Mansi 161 Ugric Northwest Reindeer 8,000 II,III
(Finno-Ugric) Siberia breeding

a Total amout of unrelated DNA samples used in this study b Laakso 1991, Kolga et al. 2001, Karafet et al. 2002


Nikun06.indd 25 24.9.2008 16:41:09
the paternal haplogroup distribution along the ␳-statistic along with mutation rates
with 12 Y-chromosome specific microsat- from Forster et al. (1996) and Saillard et al.
ellite markers, with four additional Y STRs (2000) were implemented (II). To define
analysed in the Finnish population (I, II). and test the uniparental phylogeographic
For the haplotype analysis of lactase per- structures both spatial analysis of molecu-
sistence T-13910 allele among populations, lar variance (SAMOVA; Dupanloup et al.
eight SNPs and one indel polymorphism 2002) and autocorrelation indices for DNA
with minor allele frequencies MAF > 0.07 analysis (AIDA; Bertorelle and Barbujani
distributed across a 30kb region of LCT 1995) were performed (II). Importantly,
gene was used with additional sequences correlations between mtDNA and Y chro-
(~ 700kb) flanking the whole LCT gene re- mosome distance matrices (II) as well as
gion in particular individuals (III). To dis- between FST and population recombination
entangle the allele and haplotype distribu- rate delta distances (IV) were estimated us-
tion of clinically significant cytochrome ing the Mantel test (Excoffier et al. 2005).
P450 CYP2C and CYP2D gene subfami- Allele frequencies and uniparental lineag-
ly regions we used 55 and 97 SNP mark- es were also geographically visualized us-
ers with MAF> 0.05 in dbSNP with a mean ing the MapView 6.0 program (StatSoftTM)
spacing of 7.8kb and 7.6kb, respectively (II). Moreover, pairwise FST and RST values
(IV). All the genotyping methods are de- were visualized using multidimensional-
scribed in detail in the original publications scaling (MDS) procedure implemented in
(I-IV). the STATISTICA software package (Stat-
SoftTM) (II, IV). For each population, auto-
somal haplotypes were inferred separately
3.3 DATA ANALYSIS either using the Arlequin software (Excoffi-
er et al. 2005, III) or PHASE v.2.1 software
Population diversity indices, allele frequen- package with 1000 iterations (Stephens et
cies, Hardy-Weinberg (HW) equilibrium, al. 2001, III–IV). Moreover, recombina-
and population pairwise FST- or RST-val- tion rate parameter ␳ was inferred separate-
ues along with the exact test of population ly for each population and genomic region
differentiation and the analysis of molecu- using software PHASE v.2.1 with 1000 it-
lar variance (AMOVA) were estimated us- erations (Stephens et al. 2001, IV). Non-
ing Arlequin software v3.0 (Excoffier at al. parametric Spearman correlations between
2005) (I–IV). Phylogenetic median-joining population recombination estimates and
networks were constructed using program Wilcoxon test for adjacent SNP r2-values
package Network (www.fluxus- between populations were performed with and when required locus SPSS 7.0 (IV). In addition, most of the au-
weights described by Bandelt et al (2002) tosomal genotype data modifications were
or Bosch et al (2006) were used (II, IV). performed prior analysis with Perl scripts
To estimate the coalescence age of specif- and Perl 5.8.7 (IV).
ic lineages within a uniparental network,


Nikun06.indd 26 24.9.2008 16:41:10

4.1 UNIPARENTAL GENETIC LANDSCAPE terns of North Eurasian maternal lineages
IN NORTH EURASIA (I, II) (study II, Figure 4A), where Finno-Ugric-
speaking populations form distinct clus-
Mitochondrial and Y-chromosome stud- ters at an edge of the mtDNA haplogroup
ies suggest that not only the Southwestern distribution. However, the geographical
Europe (Semino et al. 2000, Torroni et al. pattern is not so clear within the Y-chro-
2001) but also Central Asia (Wells et al. mosome, except the clustering of Finno-
2001, Zerjal et al. 2002, Comas et al. 2004, Ugric and Samojedic populations togeth-
Quintana-Murci et al. 2004) and South Si- er along with the Yakut population (study
beria (Derenko et al. 2003; 2007ab) have II, Figure 4B). Similarly (study II, Figure
had an important role in the early settle- 3AB), mtDNA pairwise FST distances iden-
ment of the modern humans into North tify Finno-Ugric-speakers as distinct clus-
Eurasia. However, the genetic roots and ters between Northeast Europe and South
dispersals of the North Eurasian Finno- Siberia/Central Asia, while the Y-chromo-
Ugric-speaking populations are not entire- some RST distances appeared less struc-
ly clear (Cavalli-Sforza et al. 1994, Derbe- tured. Even, when the Erza, Moksha and
neva et al. 2002, Karafet et al. 2002, Norio Udmurt Y-chromosomes (Pimenoff et al.
2003b, Ross et al. 2006). unpublished) are added, the RST distances
To explore uniparental neutral varia- show the Finno-Ugric population totally
tion among the North Eurasian Finno-Ug- dispersed with no clear structure. A mantel
ric-speaking populations and situate them test between mtDNA and Y-chromosome
into the North Eurasian genetic landscape, pairwise distances showed nonsignificant
42 ad 33 Eurasian mtDNA and Y chromo- correlation.
some population samples were analyzed, Indeed, most of the Finno-Ugric-speak-
respectively. In addition, Y chromosome ing populations showed to possess both
STR haplotypes from 15 Eurasian popu- West and East Eurasian associated unipa-
lations were used in further comparisons rental lineages (Figure 5AB, see also study
(Pimenoff et al. unpublished data of 85 II, Figure 1AB). This unique amalgamation
Finno-Ugric-speaking Erza, Moksha and of West and East Eurasian gene pools may
Udmurt individuals were also included). indicate either mixed origin of these pop-
In our analysis, geographically associ- ulations from genetically distinguishable
ated uniparental haplotypes showed statis- Eastern and Western Eurasia or that North
tically significant frequency trends along Eurasia was initially colonized by humans
the East-West axis of North Eurasia (study carrying both West and East Eurasian lin-
II, Figure 1AB, 2). This is congruent with eages. Previous studies of the Saami and
the current view of the clinal distribu- Finns support the idea of mixed origin in
tion of West and East Eurasian uniparen- these populations (Norio et al. 2003b, Ross
tal lineages (Richards et al. 2000, Semino et al. 2006, Johansson et al. 2006, Ingman
et al. 2000, Underhill et al. 2000, Wells et and Gyllensten 2007). However, the Cen-
al. 2001, Kivisild et al. 2002, Metspalu et tral Asian (Karafet et al. 2002, Comas et al.
al. 2004, Rootsi et al. 2007). Correspon- 2004), Southwest Asian (Quintana-Mur-
dence analysis also revealed east-west pat- ci et al. 2004) and South Siberian (Kara-


Nikun06.indd 27 24.9.2008 16:41:10


Figure 5. Distribution of the geographically associated A) mtDNA and B) Y-chromosome lineages
among the Finno-Ugric-speaking Finns (fin), Khanty (kha), Komi (kom), Mansi (man) and Saami (saa)
populations. Colors for the associated haplogroups are the following: white, West Eurasian; gray,
East Eurasian; black, South Asian (geographical classification based on study II). Population abbrevia-
tions refer to the same samples and abbreviations used in study II (Figure 1AB), except in 5B (saa2;
Pimenoff et al. unpublished data).


Nikun06.indd 28 24.9.2008 16:41:10
fet et al. 2002, Derenko et al. 2003; 2006; ed into six geographical subpopulations a
2007b) populations have shown an admix- clear local reduced heterogeneity in East-
ture of West and East Eurasian lineages ern Finland and a significant genetic differ-
with higher overall genetic diversity com- ence between Western and Eastern Finland
pared to Finno-Ugric-speakers. This sup- was observed (Figure 6; see also study I,
ports the idea that the edge of North Eur- Table 3, Lappalainen et al. 2006, Palo et
asia was colonized through Central Asia/ al. 2007).
South Siberia by human groups already Apart from the unique genetic amal-
carrying West and East Eurasian lineages. gamation and clinal distribution of west-
Moreover, the observed mitochondrial U7 ern and eastern uniparental lineages, the
and Y-chromosome N2 sublineages indi- North Eurasian Finno-Ugric-speaking
cate a more recent gene flow probably from populations are genetically a heteroge-
Central Asia to North Eurasia (Rootsi et al. neous group, mostly showing lower hap-
2007). In this context it could be explained lotype diversities among and within unipa-
that the geographical region from Central rental haplogroups when compared to more
Asia to Northeast Europe and Northwest southern populations (see also Karafet et
Siberia has been a contact zone of genet- al. 2002, Derenko et al. 2003; 2006; 2007b,
ically distinguishable Western and East- Comas et al. 2004). In a broader perspec-
ern Eurasian lineages formed by recur- tive, the ensuing loss of genetic diversity
ring migrations and admixture of distinct in populations living in Arctic and Boreal
population groups. This unique admixture regions compared to more southern areas
is currently seen in North Eurasian Finno- has been clearly demonstrated in a range
Ugric-speaking populations of other species as well (Hewitt 2001). It is
When the number of observed Y-chro- explained by the presence of southern ref-
mosome STR haplotypes was divided by uges through the Last Glacial Maximum (c.
the number of sampled individuals with- 13,000 years ago) and subsequent founder
in a population, a lower fraction of hetero- effects, gene flow and genetic drift shaping
geneity was observed in the Finns (43%), the genetic diversity of small population
Khanty (43%), Mansi (52%) and Saa- groups migrating to the North Eurasia.
mi (52%) compared to Finno-Ugric Erza
(81%), Komi (69%), Moksha (86%) and
Udmurt (67%) or most other more south- 4.2 DISTRIBUTION OF LACTASE
ern populations. The difference in heteroge- PERSISTENCE ALLELE IN NORTH
neity could be explained by the at least four EURASIA (III)
times smaller population size (Table 1) and
thus the greater sensitivity to genetic drift Most humans cannot digest lactose, i.e.
in the Khanty, Mansi and Saami compared the main milk carbohydrate, after wean-
to the Erza, Moksha, Komi and Udmurt ing due to the natural reduced activity of
populations. However, in the Finns it is not the lactase-phlorizin hydrolase (LPH) en-
the population size but a probable popula- zyme in intestinal cells (Sahi et al. 1973,
tion bottleneck in the founding of the Finn- reviewed by Swallow 2003). These indi-
ish population (Sajantila et al. 1996, Laher- viduals are considered as lactase non-per-
mo et al. 1999), which explains the reduced sistent (LNP) [MIM223100]. However,
Y-chromosome heterogeneity (Figure 6). some people maintain the LPH activity
Moreover, when the samples were divid- throughout life, i.e. are lactase persistent


Nikun06.indd 29 24.9.2008 16:41:10
Figure 6. Distribution of the most common Y-chromosome 16-loci STR haplotypes within six subpopu-
lations of Finland (study I, Figure 3B).

(LP) [MIM223100], and thus can use milk shown to correlate completely with the LP/
products without metabolic difficulty. In- LNP phenotype among the North Europe-
terestingly, among the North Eurasian and ans (Enattah et al. 2002). This T-13910 vari-
some sub-Saharan African populations, of- ant correlating with LP is located 14kb up-
ten with high dairy product consumption, stream of the LCT gene, which encodes for
the lactase persistence is relatively com- LPH (Enattah et al. 2002). Previous hap-
mon (> 80%), whereas the lactase non-per- lotype analysis showed that all LP alleles
sistence is predominant among the rest of among Finns originated from one com-
populations worldwide (Swallow 2003). mon ancestor indicating a single introduc-
Previously a single SNP C/T-13910 was tion of lactase persistence allele into North


Nikun06.indd 30 24.9.2008 16:41:10
Europe (Enattah et al. 2002). It was also haplotypes carrying the T-13910 allele (Fig-
confirmed that the LNP is the ancestral ure 7). The first cluster was observed only
state in humans like in most mammals and among the Finno-Ugric Udmurts (15%),
mutations causing LP have arisen proba- Mokshas (11%) and Erzas (5%) along
bly due to a recent positive selection as an with Iranians (5%), while the second clus-
adaptation to energy need in lactose rich ter of LP haplotypes including the domi-
diet coincident with the animal domes- nant H98 LP haplotype was observed in
tication (Hollox et al. 2001, Enattah et al. almost all populations (Figure 7; study III,
2002, Beja-Pereira et al. 2003, Bersaglieri Table 5). This observation of multiple LP
et al. 2004, Myles et al. 2005, Tishkoff et haplotype clusters among the Finno-Ug-
al. 2007). ric-speaking populations indicate two in-
To investigate the allelic background dependent origins of the LP T-13910 allele
of LP variant T-13910 in North Eurasia, we in North Eurasia. We also indentified a
genotyped eight SNPs and one indel poly- probable single LNP haplotype responsi-
morphism, including C/T-13910 variant, cov- ble for the background on which the most
ering the ~30kb of the LCT region in 37 common major LP haplotype was derived.
worldwide populations (study III, Figure This LNP background haplotype shows
1). Our results showed a high frequency the highest frequency among the Finno-
of LP T-13910 allele especially among the Ugric-speaking populations (between 33
Finno-Ugric-speaking Finns (58%), Ud- and 35%) along with Han Chinese (36%),
murt (33%), Moksha (28%) and Erza which might indicate an East Eurasian or-
(27%) (study III, Table 3). Interestingly, igin of the particular ancestral haplotype.
the reindeer breeding Khanty (Ob-Ugric, However, the molecular and demographic
3%), Mansi (Ob-Ugric, 3%) and Saami factors may bias interpretation based pure-
(17%) exhibit the lowest LP T-13910 allele ly on population frequencies. The age esti-
frequencies among the Finno-Ugrics com- mates showed that the common LP T-13910
pared to the agriculturalists with the excep- haplotype cluster (12,000–5,000 BP) is
tion of the Komi (15%; N=10) (Table 1). older than the LP T-13910 haplotype cluster
Furthermore, we identified nine differ- (3,000–1,400 BP) mainly observed among
ent haplotypes carrying the T-13910 LP vari- the North Eurasian Finno-Ugric-speaking
ant and 14 haplotypes with alleles carrying agriculturalist populations. This LP T-13910
the C-13910 LNP variant, each with a fre- allele and haplotype frequency distribu-
quency of > 4% in at least one of the popu- tion in Finno-Ugric-speaking populations
lations (study III, Table 5). One of the nine along with previous reports among the
LP haplotypes dominate in LP alleles in sub-Saharan populations (Tishkoff et al.
most populations including the Finno-Ug- 2007, Ingram et al. 2007) strongly imply
rics, in which the highest frequencies are that the LP T-13910 variant has been intro-
among the agriculturalists. Six other LP duced independently more than once into
haplotypes were also observed at the rea- North Eurasia. Moreover, the observed re-
sonably frequency (between 2% and 11%) sults support the role of still-ongoing con-
in the Finno-Ugric-speaking populations vergent evolution of the lactase persistence
mostly among the agriculturalists. among the Finno-Ugrics in response to
A median-joining network construct- adult milk consumption coincident with
ed from these common haplotypes (MAF the change in subsistence at the edge of
> 4%) revealed two distinct clusters of LP North Eurasia (Table 1).


Nikun06.indd 31 24.9.2008 16:41:10
Nikun06.indd 32
Figure 7. MJ-network of common (MAF> 4%) LNP/LP haplotypes constructed from eight SNPs and one indel marker across the 30kb LCT gene region among 37
worldwide populations (study IV, Figure 5). Arrow denotes the root of the network. LNP haplotypes are shown with yellow and LP haplotypes with green color.
The size of the circles are proportional to the estimated haplotype frequencies. Haplotype frequencies for the Finno-Ugric-speaking populations discussed in the
text are shown. The positions of the C/T-13910 and G/A-22018 alleles within the haplotype are shown in the box. The SNPs have been coded for each site as 1 for the
ancestral SNP and 2 for the derived SNP.

24.9.2008 16:41:10
4.3 PATTERNS OF LD IN CYP2C AND To characterize the recombination rate
CYP2D GENE SUBFAMILY REGIONS IN variation, LD distribution and haplotype
EUROPE (IV) structure in the CYP2C and CYP2D re-
gions we genotyped 144 SNPs across these
The cytochrome P450 oxidase gene fam- two regions in Finno-Ugric-speaking Saa-
ily comprises a set of evolutionary-relat- mi and Finns along with nine other Europe-
ed genes that code for xenobiotic metabo- an and one African population (study IV).
lism enzymes (Ingelman-Sundberg 2004). A further aim was to disentangle the past
In humans, genes within the CYP2C molecular and population genetic process-
and CYP2D regions of the cytochrome es responsible for the observed LD distri-
P450 gene subfamily code for CYP2C8, bution, inferring from known differences
CYP2C9, CYP2C18, CYP2C19 and in demographic history of Saami and Finns
CYP2D6 drug-metabolizing enzymes compared to other European populations.
(DMEs) (Wilkinson 2005). These genes In agreement with results obtained from
are highly polymorphic with several known other marker analysis (Cavalli-Sforza et al.
genetic variants associated to variable drug 1994, Ross et al. 2006, Lao et al. 2008),
reactions of significant clinical relevance the Finno-Ugric-speaking Saami and Finns
(Lewis 2004). showed significantly different CYP2C and

Figure 8. Multidimensional scaling (MDS) of population pairwise FST distances between 11 European
populations across CYP2C and CYP2D gene regions (stress value = 0.073).


Nikun06.indd 33 24.9.2008 16:41:11


Figure 9. Recombination rate estimates across the A) CYP2C and B) CYP2D regions. Y axis is expressed
in log scaled units of recombination rate (4Nr). The dash lines represent the upper and lower 95%
confidential intervals of the 10-fold average recombination rate among the European populations at
either region. The position of genes is shown as horizontal black bars on top of each graph.

CYP2D allele frequencies from other Eu- The estimated patterns of recombina-
ropean populations (Figure 8). For the rest tion rate variation revealed significant but
of the European populations including the lower correlation among European popu-
CEPH sample representing the general Eu- lations compared to correlations observed
ropean population as such in the HapMap between continental groups (Evans and
project, the observed locus-specific and Cardon 2005). Regardless of the allele fre-
population pairwise FST-values indicate quency differences and recombination rate
low degree of allele frequency differentia- heterogeneity among the studied popula-
tion for the two cytochrome P450 regions. tions, the location and magnitude of de-


Nikun06.indd 34 24.9.2008 16:41:11


Figure 10. The haplotype blocks identified at CYP2C (A) and CYP2D (B) loci based on Gabriel et al.
(2002) are shown in bars. Bars containing diagonal lines are those identified within the extended LD
region at both loci. Empty bars are LD blocks characterized outside the extended LD region. The posi-
tion of genes is shown as horizontal black bars (only CYP2 genes identified) below the depicted chro-
mosome. Vertical arrows denote estimated recombination hotspots of R1-R3.


Nikun06.indd 35 24.9.2008 16:41:11


Figure 11. Decay of r2 against the distance (kb) between marker pairs within the extended LD region
defined as logarithmic best-fit curves along A) CYP2C and B) CYP2D regions. Population abbrevia-
tions are as reported in study IV (Table 1).

tected recombination hotspots R1–R3 are combination map inferred from the Hap-
conserved in all 11 European populations Map CEPH data would be applicable for
(Figure 9) and the African Mandenka pop- other European populations. However, the
ulation. Interestingly, the CEPH European loci with lower recombination rate exhibit
reference sample shows a very similar re- more variation in the rates between popula-
combination profile with other European tions indicating differences either in recom-
populations suggesting that a fine-scale re- bination histories or in past demography. In


Nikun06.indd 36 24.9.2008 16:41:12
more detail, the Saami shows the lowest cantly the shortest LD blocks and the low-
recombination rate profile for CYP2D but est proportion of high LD while the Saami
not for CYP2C region, suggesting that re- show again the highest proportion of high
combination rate estimates are also shaped LD (study IV, Table 1), significantly differ-
by other evolutionary factors than popula- ent pattern of adjacent marker r2 values and
tion history alone (Figure 9). the longest single LD block (Figure 10B).
In previous studies the extended LD The decay of LD within CYP2D region
within the CYP2C region has shown to be shows the slowest decay in the Saami, but
conserved across populations, although this is only seen in distances below 50kb
the particular LD block structure is still as thereafter the decay is much faster in the
under debate (Ahmadi et al. 2005, Walton Saami and Finns compared to other popu-
et al. 2005, Vormfelde et al. 2007). Simi- lation. It is also noteworthy that the Saami
larly, our data show a clear extended LD possess also the lowest number of CYP2C
across the CYP2C region (Figure 10A) but and CYP2D haplotypes but higher CYP2D
with a varying pattern of adjacent marker haplotype diversity compared to most oth-
r2 values across the populations. Moreover, er Europeans (study IV, Table 1).
the LD block structure is consistent across The Saami population has remained
populations within CYP2C region with few small and constant in size throughout its
exceptions (Figure 10A). Firstly, the sub- history and is considered to have an ad-
Saharan Mandenka population shows sig- mixed West and East Eurasian genetic ori-
nificantly the shortest LD blocks (Figure gin (Ross et al. 2006). The high and extend-
10 A) with lowest proportion of high LD ed LD with fluctuating haplotype diversity
(study IV, Table 1), which is due to the lon- in the Finno-Ugric-speaking Saami could
ger period of recombination compared to be linked to the small and constant effec-
European populations who migrated out of tive population size with admixture and/or
Africa and experienced a demographic bot- subsequent long-term genetic drift shap-
tleneck. Secondly, the CEPH sample shows ing the genetic diversity and LD compared
a different pattern of LD block distribution to other European populations (Ross et al.
(Figure 10A) and the lowest proportion of 2006). Both admixture and drift are shown
high LD among Europeans (study IV, Ta- to generate enhanced but random patterns
ble 1), while the Saami exhibit the highest of genetic diversity and LD (Terwilliger et
proportion of high LD, significantly differ- al. 1998, Ardlie et al. 2002).
ent adjacent marker r2 values and the lon- Moreover, a CYP2C19*2 A-19154 mu-
gest single LD block among all popula- tation causing reduced activity for the
tions (Figure 10A). Moreover, the decay of CYP2C19 DME shows significantly higher
high LD within the CYP2C extended LD allele frequency in Saami (95% CI: 28.1–
region is slowest in the Saami and fastest 47.4%) compared to other European pop-
in the Mandenka sample (or CEPH sample ulations (95% CI: 13.4–16.0%; Xie et al.
within Europe) (Figure 11A). 1999). Interestingly, similar high frequen-
Within the CYP2D region also a clear cies (≥ 0.30) of the altered activity allele are
extended LD region is observed (Figure observed in Central and East Asia (Xiao et
10B), although the adjacent marker r2 val- al. 1997, Kimura et al. 1998, Wedlund et
ues and the LD block distribution are more al. 2000, Niu et al. 2004). This, assuming
heterogenous compared to CYP2C region. an East Eurasian origin of the CYP2C19*2
The sub-Saharan Mandenka show signifi- A-19154 mutation indicates significant Asia


Nikun06.indd 37 24.9.2008 16:41:12
contribution to the Saami gene pool simi- structure to tag haplotypes more efficiently
larly as observed in HLA markers (Johan- with larger genomic coverage compared to
sson et al. 2008), although the combined other populations analyzed. This strategy
effect of selection and drift may have en- of LD-based drift mapping originally pro-
hanced the level of A-19154 allele frequency. posed by Terwilliger et al (1998) may offer
The Finno-Ugric-speaking Saami, based a great advantage for further identification
on the lower level recombination and high- of alleles associated to common complex
er extended LD within the CYP2C and pharmacogenetic traits.
CYP2D regions exhibit an optimal genetic


Nikun06.indd 38 24.9.2008 16:41:12

The major aim of this thesis was to exam- denka population. From all studied popu-
ine the origins and distribution of uniparen- lations the Saami showed also significantly
tal and autosomal genetic variation among the highest allele frequency of a CYP2C19
the Finno-Ugric-speaking human popula- gene mutation causing variable drug reac-
tions living in Boreal and Arctic regions of tions. The diversity patterns observed with-
North Eurasia. In more detail, I aimed to in CYP2C and CYP2D regions emphasize
disentangle the underlying molecular and the strong effect of demographic history
population genetic factors which have pro- shaping genetic diversity and LD especial-
duced the patterns of uniparental and au- ly among such small and constant size pop-
tosomal genetic diversity in these popu- ulations as the Finno-Ugric-speaking Saa-
lations. Among Finno-Ugrics the genetic mi. Moreover, the increased LD in Saami
amalgamation and clinal distribution of due to genetic drift and/or admixture was
West and East Eurasian gene pools were shown to offer an advantage for further at-
observed within uniparental markers. This tempts to identify alleles associated to com-
admixture indicates that North Eurasia mon complex pharmacogenetic traits.
was colonized through Central Asia/ South A challenge in future studies of human
Siberia by human groups already carry- genome variation is to understand the mo-
ing both West and East Eurasian lineages. lecular basis of common complex diseas-
The complex combination of founder ef- es, and variable sensitivity to drugs, patho-
fects, gene flow and genetic drift underly- gens and other environmental factors when
ing the genetic diversity of the Finno-Ug- recent developments in genotyping and ge-
ric-speaking populations were emphasized nomic resequencing have enabled the high-
by low haplotype diversity within and throughput genome-wide studies such as
among uniparental and biparental markers. the HapMap (International HapMap Con-
A high prevalence of lactase persistence sortium 2007) and 1000 Genomes (Kaiser
allele among the North Eurasian Finno- 2008). These studies aim primarily at vali-
Ugric agriculturalist populations was also dating genetic variation without ascertain-
shown indicating a local adaptation to ment bias, and secondarily to explore the
subsistence change with lactose rich diet. evolutionary factor shaping genetic diver-
Moreover, the haplotype background of sity. Both large scale genotyping projects
lactase persistence allele among the Finno- and fine-scale resequencing studies of re-
Ugric-speakers strongly suggested that the stricted genome regions assessed in differ-
lactase persistence T-13910 mutation was in- ent populations are needed for further re-
troduced independently more than once to finements of the recombination hotspot and
the North Eurasian gene pool. A signifi- LD block structures within the haplotype
cant difference in genetic diversity, haplo- map of the human genome. A solid under-
type structure and LD distribution within standing of the human genomic variation
the cytochrome P450 CYP2C and CYP2D and haplotype structure within will enable
regions revealed the unique gene pool of further determination of our evolutionary
the Finno-Ugric Saami created mainly by past and enhance the identification of ge-
population genetic processes compared to netic variants underlying common complex
other Europeans and sub-Saharan Man- traits and diseases.


Nikun06.indd 39 24.9.2008 16:41:12

This study was carried out between 2002– ond supervisor Professor David Comas for
2008 in the Department of Forensic Medi- a chance to work with him in ever enjoy-
cine at the University of Helsinki and as a able atmosphere. Your valuable advice and
visiting scientist in the Evolutionary Biol- broad knowledge in human population ge-
ogy Unit at the University of Pompeu Fabra netics have kept me on a right track. More-
along with a one and a half year vacation over, your never ending good humour and
fulfilling the National Military Service. social skills to survive with the whining
The study was financially supported by PhD student have always impressed me.
The Finnish Cultural Foundation, the Fed- Professor Ulf Gyllensten and Profes-
eration of European Biochemical Societies sor Pekka Pamilo deserve enormous com-
and the Finnish Graduate School in Popu- pliments for carefully reviewing my thesis
lation Genetics along with grants from the during their summer holiday season and
EU and the University of Helsinki. for their valuable comments. I am also in-
I wish to thank the former head of the depted to Professor Kimmo Kontula for
Forensic Department, Professor Antti Pent- important advice to conduct my final years
tilä, and the new director Professor Erkki in PhD studies.
Vuori, for providing the excellent research It has always been inspiring and chal-
facilities with great academic atmosphere. lenging to work in the OLL-BIO laborato-
I also wish to thank Professor Jaume Ber- ry at the Department of Forensic Medicine.
tranpetit for inviting me to visit the inspir- I have learned everything about genotyp-
ing Evolutionary Biology Unit at the Uni- ing and forensic laboratory work during
versity of Pompeu Fabra. these years in Kytösuontie 11. Especially
The deepest gratitude I wish to express I want to thank my colleague Minttu Hed-
to my supervisor Professor Antti Sajanti- man, with whom I have not only shared an
la at the Department of Forensic Medicine. office and scientific papers but also mo-
When I finally managed to meet up with ments of scientific joy while filling bureau-
the world famous Finno-Ugric professor I cratic applications or glasses of wine. Juk-
really got excited. Your personal enthusi- ka Palo is also greatly thanked for revising
asm and deep knowledge concerning North most of my scientific writings and restless-
Eurasian Finno-Ugric populations and hu- ly explaining to me the basics of popula-
man population genetics in general hooked tion genetics. I also want to express my sin-
me. Since then I have learned a lot from cere gratitude to the rest of the former or
you about how to conduct scientific work. current oll-bio members: Antti L, Hanna,
Your broad understanding and interest in Silvia, Johanna, Anna, Yukiko, Eve, Tei-
science, genetics and life itself have con- ja, Kirsti, Pia, Helmuth, Katarina, Hannu
stantly carried on and stimulated my own and Mikko. Thank you for all the great mo-
sometimes restless and narrow mind. Your ments and good coffee breaks.
patience with me has been unbelievable, I also had a great pleasure to take part
especially when things have gone wrong. in the LD-EUROPE project and work with
I have been very lucky to have had the op- great scientists: Laurent Excoffier, Gil-
portunity to work under your supervision. laume Lavall, Howard Cann, Sir Walter
I am also greatly indebted to my sec- Bodmer, Susan Tonks, Irina Evseeva, Al-


Nikun06.indd 40 24.9.2008 16:41:12
berto Piazza, Francesca Crobu, Silvana your archaeological inspiration and intel-
Santachiara-Benerecetti, Ornella Semi- ligent company while digging treasures in
no, Anna Gonzalez-Neira and Claudia de eastern Finland. Risto and Pyry, you are
Toma. Thank you for your collaboration. greatly acknowledged for keeping an old
I also wish to thank all the bioevo-peo- guy sane in the Finnish forest. Juha and
ple in Pompeu Fabra with whom I had the Aki, the two Hakunila musketeers, thank
opportunity to enjoy scientifically inspiring you for sharing fantastic journeys and
moments around the coffee machine—and private discussions. Gerald Matei (1974–
obviously all the GFs in Bitacora: Monica, 2007), your never ending energy to get
Roger, Stephanie, Ferran, Ixa, Urko, Elo- people to smile and enjoy life are greatly
die, Hafid, Araceli, Chiara, Andrés, Karla, missed. And for Sampsa and Petri – thank
Michelle, Gemma, Carles, Josep, Ricardo, you for your friendship. I also wish to
Marta, Belen, Anna F, Anna R, Olga, Rui, thank all relatives. Especially Konsta, your
Begoña, Oscar, Elena, Fracesc and Arcadi. visual eye and hilarious humour is more
Thank you and hope to see you soon. than greatly thanked during the printing of
Special BCN thanks go to Martin and this thesis.
Renia for being superkind hosts for home- A warmest gratitude and sincere thanks
less Finns and always being eager to share are due to my family. Dear Heli and Georg.
common weekend adventures. Bruno and Thank you for all your mental and physical
Jordi. It’s always a joy even just to talk care. And do not worry, as your prodigal
with you and enjoy life while watching op- son will return. Vilma and Wander. Thank
eración triumfo. you for all the late dinners and great mo-
Numerous friends in Capoeira Forca ments, which have kept me going.
Natural are greatly acknowledged for vari- Agnes “Oma”, dear grandmother (1910–
ous relaxing moments and aching muscles 2008). I wish you would be here to share a
outside the scientific world. Antti Korpi- cup of tea with lemon.
saari and Jussi Korhonen, thank you for


Nikun06.indd 41 24.9.2008 16:41:12
Abondolo D (1998) The Uralic Languages. Lon- Beck S, Trowsdale J (2000) The human major
don and New York histocompatability complex: Lessons from
Ahmadi KR, Weale ME, Xue ZY, Soranzo N, Yar- the DNA sequence. Annu Rev Genomics Hum
nall DP, Briley JD, Maruyama Y, Kobayashi M, Genet 1:117–137
Wood NW, Spurr NK, Burns DK, Roses AD, Beja-Pereira A, Luikart G, England PR, Brad-
Saunders AM, Goldstein DB (2005) A single- ley DG, Jann OC, Bertorelle G, Chamberlain
nucleotide polymorphism tagging set for hu- AT, Nunes TP, Metodiev S, Ferrand N, Erhardt
man drug metabolism and transport. Nat Genet G (2003) Gene-culture coevolution between
37:84–89 cattle milk protein genes and human lactase
Akey JM, Zhang G, Zhang K, Jin L, Shriver MD genes. Nat Genet 35:311–313
(2002) Interrogating a high-density SNP map Bergman I (2004) Deglaciation and colonization:
for signatures of natural selection. Genome Pioneer settlements in Northern Fennoscandia.
Res 12:1805–1814 Journal of World Prehistory:155–177
Anderson S, Bankier AT, Barrell BG, de Bruijn Bersaglieri T, Sabeti PC, Patterson N, Vander-
MH, Coulson AR, Drouin J, Eperon IC, Nier- ploeg T, Schaffner SF, Drake JA, Rhodes M,
lich DP, Roe BA, Sanger F, Schreier PH, Smith Reich DE, Hirschhorn JN (2004) Genetic sig-
AJ, Staden R, Young IG (1981) Sequence and natures of strong recent positive selection at
organization of the human mitochondrial ge- the lactase gene. Am J Hum Genet 74:1111–
nome. Nature 290:457–465 1120
Andrews RM, Kubacka I, Chinnery PF, Lightowl- Bertorelle G, Barbujani G (1995) Analysis of
ers RN, Turnbull DM, Howell N (1999) Re- DNA diversity by spatial autocorrelation. Ge-
analysis and revision of the Cambridge refer- netics 140:811–819
ence sequence for human mitochondrial DNA. Bertorelle G, Excoffier L (1998) Inferring ad-
Nat Genet 23:147 mixture proportions from molecular data. Mol
Ardlie KG, Kruglyak L, Seielstad M (2002) Pat- Biol Evol 15:1298–1311
terns of linkage disequilibrium in the human Bertranpetit J, Calafell F, Comas D, Gonzalez-
genome. Nat Rev Genet 3:299–309 Neira A, Navarro A (2003) Structure of link-
Armour JA, Anttinen T, May CA, Vega EE, Sajan- age disequilibrium in humans: Genome fac-
tila A, Kidd JR, Kidd KK, Bertranpetit J, Paa- tors and population stratification. Cold Spring
bo S, Jeffreys AJ (1996) Minisatellite diversi- Harb Symp Quant Biol 68:79–88
ty supports a recent african origin for modern Bosch E, Calafell F, Gonzalez-Neira A, Flaiz C,
humans. Nat Genet 13:154–160 Mateu E, Scheil HG, Huckenbeck W, Efre-
Arnheim N, Calabrese P, Nordborg M (2003) Hot movska L, Mikerezi I, Xirotiris N, Grasa C,
and cold spots of recombination in the human Schmidt H, Comas D (2006) Paternal and
genome: The reason we should find them and maternal lineages in the Balkans show a ho-
how this can be achieved. Am J Hum Genet mogeneous landscape over linguistic barriers,
73:5–16 except for the isolated Aromuns. Ann Hum
Bandelt HJ, Quintana-Murci L, Salas A, Macaul- Genet 70:459–487
ay V (2002) The fingerprint of phantom muta- Brinkmann B, Klintschar M, Neuhuber F, Huh-
tions in mitochondrial DNA data. Am J Hum ne J, Rolf B (1998) Mutation rate in human
Genet 71:1150–1160 microsatellites: Influence of the structure and
Bandelt HJ, Richards M, Macaulay V (eds) length of the tandem repeat. Am J Hum Genet
(2006) Human Mitochondrial DNA and the 62:1408–1415
Evolution of Homo Sapiens. Springer, Berlin Bustamante CD, Fledel-Alon A, Williamson S,
Heidelberg New York Nielsen R, Hubisz MT, Glanowski S, Tanen-
Barbujani G, Magagni A, Minch E, Cavalli-Sfor- baum DM, White TJ, Sninsky JJ, Hernandez
za LL (1997) An apportionment of human RD, Civello D, Adams MD, Cargill M, Clark
DNA diversity. Proc Natl Acad Sci U S A AG (2005) Natural selection on protein-cod-
94:4516–4519 ing genes in the human genome. Nature


Nikun06.indd 42 24.9.2008 16:41:12
Cann HM, de Toma C, Cazes L, Legrand MF, Comas D, Plaza S, Wells RS, Yuldaseva N, Lao
Morel V, Piouffre L, Bodmer J et al. (2002) A O, Calafell F, Bertranpetit J (2004) Admix-
human genome diversity cell line panel. Sci- ture, migrations, and dispersals in central asia:
ence 296:261–262 Evidence from maternal DNA lineages. Eur J
Cann RL, Stoneking M, Wilson AC (1987) Mito- Hum Genet 12:495–504
chondrial DNA and human evolution. Nature Conrad DF, Jakobsson M, Coop G, Wen X, Wall
325:31–36 JD, Rosenberg NA, Pritchard JK (2006) A
Carlson CS, Eberle MA, Kruglyak L, Nickerson worldwide survey of haplotype variation and
DA (2004) Mapping complex disease loci in linkage disequilibrium in the human genome.
whole-genome association studies. Nature Nat Genet 38:1251–1260
429:446–452 Cooper A, Poinar HN (2000) Ancient DNA: Do it
Carroll SB (2003) Genetics and the making of right or not at all. Science 289:1139
Homo sapiens. Nature 422:849–857 Crawford DC, Bhangale T, Li N, Hellenthal G,
Cavalli-Sforza LL (2005) The human genome di- Rieder MJ, Nickerson DA, Stephens M (2004)
versity project: Past, present and future. Nat Evidence for substantial fine-scale variation
Rev Genet 6:333–340 in recombination rates across the human ge-
Cavalli-Sforza LL, Menozzi P, Piazza A (1994) nome. Nat Genet 36:700–706
The History and Geography of Human Genes. Daly MJ, Rioux JD, Schaffner SF, Hudson TJ,
Princeton, NJ Princeton University Press Lander ES (2001) High-resolution haplotype
Cavalli-Sforza LL, Wilson AC, Cantor CR, Cook- structure in the human genome. Nat Genet
Deegan RM, King MC (1991) Call for a world- 29:229–232
wide survey of human genetic diversity: A Darwin CR (1859) The Origin of Species by
vanishing opportunity for the human genome Means of Natural Selection. John Murray,
project. Genomics 11:490–491 London.
Chakraborty R, Weiss KM (1988) Admixture de Knijff P (2000) Messages through bottlenecks:
as a tool for finding linked genes and de- On the combined use of slow and fast evolv-
tecting that difference from allelic associa- ing polymorphic markers on the human Y chro-
tion between loci. Proc Natl Acad Sci U.S.A. mosome. Am J Hum Genet 67:1055–1061
85:9119–9123 de la Chapelle A, Wright FA (1998) Linkage dis-
Chi PB, Duggal P, Kao WH, Mathias RA, Grant equilibrium mapping in isolated populations:
AV, Stockton ML, Garcia JG, Ingersoll RG, The example of Finland revisited. Proc Natl
Scott AF, Beaty TH, Barnes KC, Fallin MD Acad Sci U.S.A.95:12416–12423
(2006) Comparison of SNP tagging methods Deloukas P, Bentley D (2004) The HapMap proj-
using empirical data: Association study of 713 ect and its application to genetic studies of
SNPs on chromosome 12q14.3–12q24.21 for drug response. Pharmacogenomics J 4:88–90
asthma and total serum IgE in an African Ca- Denoeud F, Vergnaud G, Benson G (2003) Pre-
ribbean population. Genet Epidemiol 30:609– dicting human minisatellite polymorphism.
619 Genome Res 13:856–867
Chimpanzee Sequencing and Analysis Consor- Derbeneva OA, Starikovskaya EB, Wallace DC,
tium (2005) Initial sequence of the chimpan- Sukernik RI (2002) Traces of early Eurasians
zee genome and comparison with the human in the Mansi of Northwest Siberia revealed by
genome. Nature 437:69–87 mitochondrial DNA analysis. Am J Hum Gen-
Clark AG (1990) Inference of haplotypes from et 70:1009–1014
PCR-amplified samples of diploid popula- Derenko M, Malyarchuk B, Denisova GA,
tions. Mol Biol Evol 7:111–122 Wozniak M, Dambueva I, Dorzhu C, Luzina F,
Clark AG (2004) The role of haplotypes in can- Miscicka-Sliwka D, Zakharov I (2006) Con-
didate gene studies. Genet Epidemiol 27:321– trasting patterns of Y-chromosome variation in
333 South Siberian populations from Baikal and
Collins FS, Green ED, Guttmacher AE, Guyer Altai-Sayan regions. Hum Genet 118(5):591–
MS, US National Human Genome Research 604
Institute (2003) A vision for the future of ge- Derenko M, Malyarchuk B, Denisova G, Wozniak
nomics research. Nature 422:835–847 M, Grzybowski T, Dambueva I, Zakharov I
(2007a) Y-chromosome haplogroup N disper-
sals from South Siberia to Europe. J Hum Gen-
et 52:763–770


Nikun06.indd 43 24.9.2008 16:41:13
Derenko M, Malyarchuk B, Grzybowski T, Den- Forster P, Harding R, Torroni A, Bandelt HJ
isova G, Dambueva I, Perkova M, Dorzhu (1996) Origin and evolution of native Amer-
C, Luzina F, Lee HK, Vanecek T, Villems R, ican mtDNA variation: A reappraisal. Am J
Zakharov I (2007b) Phylogeographic analysis Hum Genet 59:935–945
of mitochondrial DNA in Northern Asian pop- Gabriel SB, Schaffner SF, Nguyen H, Moore JM,
ulations. Am J Hum Genet 81:1025–1041 Roy J, Blumenstiel B, Higgins J, DeFelice M,
Derenko MV, Grzybowski T, Malyarchuk BA, Lochner A, Faggart M, Liu-Cordero SN, Roti-
Dambueva IK, Denisova GA, Czarny J, Dor- mi C, Adeyemo A, Cooper R, Ward R, Lander
zhu CM, Kakpakov VT, Miscicka-Sliwka D, ES, Daly MJ, Altshuler D (2002) The struc-
Wozniak M, Zakharov IA (2003) Diversity of ture of haplotype blocks in the human genome.
mitochondrial DNA lineages in South Siberia. Science 296:2225–2229
Ann Hum Genet 67:391–411 Gibson J, Morton NE, Collins A (2006) Extended
Ding K, Zhou K, Zhang J, Knight J, Zhang X, tracts of homozygosity in outbred human pop-
Shen Y (2005) The effect of haplotype-block ulations. Hum Mol Genet 15:789–795
definitions on inference of haplotype-block Goebel T, Derevianko AP, Petrin VT (1993) Dat-
structure and htSNPs selection. Mol Biol Evol ing the middle-to-upper-paleolithic transition
22:148–159 at Kara-Bom. Curr Anthropol 34:452–458
Dolukhanov PM, Shukurov AM, Tarasov PE, Goldstein DB (2001) Islands of linkage disequi-
Zaitseva GI (2002) Colonization of North- librium. Nat Genet 29:109–111
ern Eurasia by modern humans: Radiocarbon Goldstein DB, Cavalleri GL (2005) Genom-
chronology and environment. Journal of Ar- ics: Understanding human diversity. Nature
chaeological Science 29:593–606 437:1241–1242
Dupanloup I, Schneider S, Excoffier L (2002) A Gonzalez-Neira A, Ke X, Lao O, Calafell F, Na-
simulated annealing approach to define the varro A, Comas D, Cann H, Bumpstead S,
genetic structure of populations. Mol Ecol Ghori J, Hunt S, Deloukas P, Dunham I, Car-
11:2571–2581 don LR, Bertranpetit J (2006) The portability
Ellegren H (2004) Microsatellites: Simple se- of tagSNPs across populations: A worldwide
quences with complex evolution. Nat Rev survey. Genome Res 16:323–330
Genet 5:435–445 Greely HT (2001a) Human genome diversity:
Enattah NS, Sahi T, Savilahti E, Terwilliger JD, What about the other human genome project?
Peltonen L, Jarvela I (2002) Identification of Nat Rev Genet 2:222–227
a variant associated with adult-type hypolacta- Greely HT (2001b) Informed consent and other
sia. Nat Genet 30:233–237 ethical issues in human population genetics.
Evans DM, Cardon LR (2005) A comparison of Annu Rev Genet 35:785–800
linkage disequilibrium patterns and estimat- Green RE, Krause J, Ptak SE, Briggs AW, Ronan
ed population recombination rates across mul- MT, Simons JF, Du L, Egholm M, Rothberg
tiple populations. Am J Hum Genet 76:681– JM, Paunovic M, Paabo S (2006) Analysis of
687 one million base pairs of Neanderthal DNA.
Excoffier L, Laval G, Schneider S (2005) Arle- Nature 444:330–336
quin ver. 3.0: An integrated software package Greenberg JH (2000) Indo-European and its Co-
for population genetics data analysis. Evolu- sest Relatives; the Eurasiatic Language Family.
tionary Bioinformatics Online 1:47–50 Stanford, Stanford University Press Volume 1,
Excoffier L, Slatkin M (1995) Maximum-likeli- Grammar
hood estimation of molecular haplotype fre- Guglielmino CR, Piazza A, Menozzi P, Cavalli-
quencies in a diploid population. Mol Biol Sforza LL (1990) Uralic genes in Europe. Am
Evol 12:921–927 J Phys Anthropol 83:57–68
Finnila S, Lehtonen MS, Majamaa K (2001) Phy- Haldane JBS (1924) A mathematical theory of
logenetic network for European mtDNA. Am J natural and artificial selection. Transactions
Hum Genet 68:1475–1484 of the Cambridge philosophical society Part
Fisher RA (1930) The Genetical Theory of Natu- I:19–41
ral Selection. Clarendon Press, Oxford, Oxford Halldorsson BV, Bafna V, Lippert R, Schwartz R,
University Press, Oxford De La Vega FM, Clark AG, Istrail S (2004)
Optimal haplotype block-free selection of tag-
ging SNPs for genome-wide association stud-
ies. Genome Res 14:1633–1640


Nikun06.indd 44 24.9.2008 16:41:13
Hastbacka J, de la Chapelle A, Kaitila I, Sistonen Ingram CJ, Elamin MF, Mulcare CA, Weale
P, Weaver A, Lander E (1992) Linkage dis- ME, Tarekegn A, Raga TO, Bekele E, Elamin
equilibrium mapping in isolated founder popu- FM, Thomas MG, Bradman N, Swallow DM
lations: Diastrophic dysplasia in Finland. Nat (2007) A novel polymorphism associated with
Genet 2:204–211 lactose tolerance in Africa: Multiple causes
Hedman M, Brandstatter A, Pimenoff V, Sistonen for lactase persistence? Hum Genet 120:779–
P, Palo JU, Parson W, Sajantila A (2007) Finn- 788
ish mitochondrial DNA HVS-I and HVS-II International HapMap Consortium (2003) The in-
population data. Forensic Sci Int 172:171– ternational HapMap project. Nature 426:789–
178 796
Hewitt GM (1999) Post-glacial recolonization of International HapMap Consortium (2005) A
European biota. Biological Journal of the Lin- haplotype map of the human genome. Nature
nean Society:87–112 437:1299–1320
Hewitt GM (2001) Speciation, hybrid zones and International HapMap Consortium, Frazer KA,
phylogeography – or seeing genes in space Ballinger DG, Cox DR, Hinds DA, Stuve LL,
and time. Mol Ecol 10:537–549 Gibbs RA et al. (2007) A second generation
Heyer E, Puymirat J, Dieltjes P, Bakker E, de human haplotype map of over 3.1 million
Knijff P (1997) Estimating Y chromosome SNPs. Nature 449:851–861
specific microsatellite mutation frequencies International Human Genome Sequencing Con-
using deep rooting pedigrees. Hum Mol Gen- sortium (2004) Finishing the euchromatic se-
et 6:799–803 quence of the human genome. Nature 431:931–
Hill WG, Weir BS (1994) Maximum-likelihood 945
estimation of gene location by linkage dis- Jeffreys AJ, Kauppi L, Neumann R (2001) In-
equilibrium. Am J Hum Genet 54:705–714 tensely punctate meiotic recombination in the
Hinds DA, Stuve LL, Nilsen GB, Halperin E, class II region of the major histocompatibility
Eskin E, Ballinger DG, Frazer KA, Cox DR complex. Nat Genet 29:217–222
(2005) Whole-genome patterns of common Jeffreys AJ, Neumann R (2002) Reciprocal cross-
DNA variation in three human populations. over asymmetry and meiotic drive in a human
Science 307:1072–1079 recombination hot spot. Nat Genet 31:267–
Hollox EJ, Poulter M, Zvarik M, Ferak V, Krause 271
A, Jenkins T, Saha N, Kozlov AI, Swallow DM Jeffreys AJ, Ritchie A, Neumann R (2000) High
(2001) Lactase haplotype diversity in the old resolution analysis of haplotype diversity and
world. Am J Hum Genet 68:160–172 meiotic crossover in the human TAP2 recom-
Howell N, Oostra RJ, Bolhuis PA, Spruijt L, bination hotspot. Hum Mol Genet 9:725–733
Clarke LA, Mackey DA, Preston G, Herrn- Jobling MA, Hurles M, Tyler-Smith C (2003) Hu-
stadt C (2003) Sequence analysis of the mito- man evolutionary genetics: Origins, peoples
chondrial genomes from Dutch pedigrees with and disease. Garland Science
Leber hereditary optic neuropathy. Am J Hum Jobling MA, Pandya A, Tyler-Smith C (1997) The
Genet 72:1460–1469 Y chromosome in forensic analysis and pater-
Ingelman-Sundberg M (2004) Human drug me- nity testing. Int J Legal Med 110:118–124
tabolising cytochrome P450 enzymes: Proper- Jobling MA, Tyler-Smith C (2003) The human Y
ties and polymorphisms. Naunyn Schmiede- chromosome: An evolutionary marker comes
bergs Arch Pharmacol 369:89–104 of age. Nat Rev Genet 4:598–612
Ingman M, Gyllensten U (2007) A recent genetic Johansson A, Ingman M, Mack SJ, Erlich H, Gyl-
link between Sami and the Volga-Ural region lensten U (2008) Genetic origin of the Swed-
of Russia. Eur J Hum Genet 15:115–120 ish Sami inferred from HLA class I and class
Ingman M, Kaessmann H, Paabo S, Gyllensten II allele frequencies. Eur J Hum Genet (in
U (2000) Mitochondrial genome variation press)
and the origin of modern humans. Nature Johansson A, Vavruch-Nilsson V, Cox DR, Fraz-
408:708–713 er KA, Gyllensten U (2007) Evaluation of the
SNP tagging approach in an independent pop-
ulation sample--array-based SNP discovery in
Sami. Hum Genet 122:141–150


Nikun06.indd 45 24.9.2008 16:41:13
Johansson A, Vavruch-Nilsson V, Edin-Liljegren Kayser M, Kittler R, Erler A, Hedman M, Lee AC,
A, Sjolander P, Gyllensten U (2005) Linkage Mohyuddin A, Mehdi SQ, Rosser Z, Stonek-
disequilibrium between microsatellite mark- ing M, Jobling MA, Sajantila A, Tyler-Smith C
ers in the Swedish Sami relative to a world- (2004) A comprehensive survey of human Y-
wide selection of populations. Hum Genet chromosomal microsatellites. Am J Hum Gen-
116:105–113 et 74:1183–1197
Johnson GC, Esposito L, Barratt BJ, Smith AN, Kayser M, Roewer L, Hedman M, Henke L, Hen-
Heward J, Di Genova G, Ueda H, Cordell HJ, ke J, Brauer S, Kruger C, Krawczak M, Nagy
Eaves IA, Dudbridge F, Twells RC, Payne M, Dobosz T, Szibor R, de Knijff P, Stoneking
F, Hughes W, Nutland S, Stevens H, Carr P, M, Sajantila A (2000) Characteristics and fre-
Tuomilehto-Wolf E, Tuomilehto J, Gough SC, quency of germline mutations at microsatel-
Clayton DG, Todd JA (2001) Haplotype tag- lite loci from the human Y chromosome, as re-
ging for the identification of common disease vealed by direct observation in father/son pairs.
genes. Nat Genet 29:233–237 Am J Hum Genet 66:1580–1588
Jorde LB (2000) Linkage disequilibrium and the Kelley JL, Madeoy J, Calhoun JC, Swanson W,
search for complex disease genes. Genome Akey JM (2006) Genomic signatures of posi-
Res 10:1435–1444 tive selection in humans and the limits of out-
Jorde LB, Rogers AR, Bamshad M, Watkins WS, lier approaches. Genome Res 16:980–989
Krakowiak P, Sung S, Kere J, Harpending HC Kere J (2001) Human population genetics: Les-
(1997) Microsatellite diversity and the demo- sons from Finland. Annu Rev Genomics Hum
graphic history of modern humans. Proc Natl Genet 2:103–128
Acad Sci U S A 94:3100–3103 Khusnutdinova EK, Viktorova TV, Fatkhlislamo-
Jorde LB, Watkins WS, Bamshad MJ, Dixon ME, va RI, Khidiatova IM (1999) Restriction poly-
Ricker CE, Seielstad MT, Batzer MA (2000) morphism of the major non-decoding region
The distribution of human genetic diversity: of mitochondrial DNA in human populations
A comparison of mitochondrial, autosomal, from the Volga-Ural region. Genetika 35:695–
and Y-chromosome data. Am J Hum Genet 702
66:979–988 Kidd KK, Pakstis AJ, Speed WC, Kidd JR (2004)
Kaessmann H, Paabo S (2002) The genetical his- Understanding human DNA sequence varia-
tory of humans and the great apes. J Intern tion. J Hered 95:406–420
Med 251:1–18 Kim Y, Stephan W (2002) Detecting a local sig-
Kaessmann H, Zollner S, Gustafsson AC, Wie- nature of genetic hitchhiking along a recom-
be V, Laan M, Lundeberg J, Uhlen M, Paabo bining chromosome. Genetics 160:765–777
S (2002) Extensive linkage disequilibrium Kimura M (1968) Evolutionary rate at the molec-
in small human populations in Eurasia. Am J ular level. Nature 217:624–626
Hum Genet 70:673–685 Kimura M, Ieiri I, Mamiya K, Urae A, Higuchi S
Kaiser J (2008) DNA sequencing. A plan to cap- (1998) Genetic polymorphism of cytochrome
ture human diversity in 1000 genomes. Sci- P450s, CYP2C19, and CYP2C9 in a Japanese
ence 319:395 population. Ther Drug Monit 20:243–247
Karafet TM, Osipova LP, Gubina MA, Posukh Kittles RA, Bergen AW, Urbanek M, Virkkunen
OL, Zegura SL, Hammer MF (2002) High lev- M, Linnoila M, Goldman D, Long JC (1999)
els of Y-chromosome differentiation among na- Autosomal, mitochondrial, and Y chromo-
tive Siberian populations and the genetic sig- some DNA variation in Finland: Evidence for
nature of a boreal hunter-gatherer way of life. a male-specific bottleneck. Am J Phys Anthro-
Hum Biol 74:761–789 pol 108:381–399
Kauppi L, Sajantila A, Jeffreys AJ (2003) Re- Kittles RA, Perola M, Peltonen L, Bergen AW,
combination hotspots rather than population Aragon RA, Virkkunen M, Linnoila M, Gold-
history dominate linkage disequilibrium in man D, Long JC (1998) Dual origins of Finns
the MHC class II region. Hum Mol Genet revealed by Y chromosome haplotype varia-
12:33–40 tion. Am J Hum Genet 62:1171–1179
Kivisild T, Tolk HV, Parik J, Wang Y, Papiha SS,
Bandelt HJ, Villems R (2002) The emerging
limbs and twigs of the East Asian mtDNA
tree. Mol Biol Evol 19:1737–1751


Nikun06.indd 46 24.9.2008 16:41:13
Kolga M, Tõnurist I, Vaba L, Viikberg J (2001) Laitinen V, Lahermo P, Sistonen P, Savontaus
The Red Book of the Peoples of the Russian ML (2002) Y-chromosomal diversity suggests
Empire. Tallinn, NGO Red Book that Baltic males share common Finno-Ugric-
Kong A, Gudbjartsson DF, Sainz J, Jonsdottir speaking forefathers. Hum Hered 53:68–78
GM, Gudjonsson SA, Richardsson B, Sigur- Lander ES (1996) The new genomics: Global
dardottir S, Barnard J, Hallbeck B, Masson G, views of biology. Science 274:536–539
Shlien A, Palsson ST, Frigge ML, Thorgeirs- Lander ES, Linton LM, Birren B, Nusbaum C,
son TE, Gulcher JR, Stefansson K (2002) A Zody MC, Baldwin J, Devon K et al. (2001)
high-resolution recombination map of the hu- Initial sequencing and analysis of the human
man genome. Nat Genet 31:241–247 genome. Nature 409:860–921
Krebs CJ (1994) Ecology: The experimental anal- Landsteiner K (1931) Individual differences in
ysis of distribution and abundance. human blood. Science 73:403–409
Kruglyak L (1999) Prospects for whole-genome Lao O, Lu TT, Nothnagel M, Junge O, Freitag-
linkage disequilibrium mapping of common Wolf S, Caliebe A, Balascakova M, Bertran-
disease genes. Nat Genet 22:139–144 petit J, Bindoff LA, Comas D, Holmlund G,
Kruglyak L, Nickerson DA (2001) Variation is Kouvatsi A, Macek M, Mollet I, Parson W,
the spice of life. Nat Genet 27:234–236 Palo J, Ploski R, Sajantila A, Tagliabracci A,
Kryukov GV, Pennacchio LA, Sunyaev SR Gether U, Werge T, Rivadeneira F, Hofman A,
(2007) Most rare missense alleles are deleteri- Uitterlinden AG, Gieger C, Wichmann HE,
ous in humans: Implications for complex dis- Rüther A, Schreiber S, Becker C, Nürnberg P,
ease and association studies. Am J Hum Genet Nelson MR, Krawczak M, Kayser M (2008)
80:727–739 Correlation between genetic and geographic
Kutuev I, Khusainova R, Karunas A, Yunus- structure in Europe.Curr Biol. 18:1241–1248
bayev B, Fedorova S, Lebedev Y, Hunsmann Lappalainen T, Koivumaki S, Salmela E, Huo-
G, Khusnutdinova E (2006) From east to west: ponen K, Sistonen P, Savontaus ML, Laher-
Patterns of genetic diversity of populations mo P (2006) Regional differences among the
living in four Eurasian regions. Hum Hered Finns: A Y-chromosomal perspective. Gene
61:1–9 376:207–215
Kuzmin YV, Keates SG: Comment on “Coloniza- Lewis DF (2004) 57 varieties: The human cy-
tion of Northern Eurasia by Modern Humans: tochromes P450. Pharmacogenomics 5:305–
Radiocarbon Chronology and Environment” 318
by P.M. Dolukhanov, A.M. Shukurov, P.E. Lewontin RC (1964) The interaction of selection
Tarasov and G.I. Zaitseva. Journal of Archaeo- and linkage. I. general considerations; heterot-
logical Science 29, 593–606 (2002). Journal of ic models. Genetics 49:49–67
Archaeological Science, 2004; 31: 141–143 Lewontin RC (1972) The apportionment of hu-
Laakso J (1992) Uralilaiset kansat. Tietoa suomen man diversity. Evol Biol 6:381–398
sukukielistä ja niiden puhujista. WSOY, Juva Lewontin RC (1988) On measures of gametic dis-
Laan M, Paabo S (1997) Demographic history equilibrium. Genetics 120:849–852
and linkage disequilibrium in human popula- Lewontin RC, Kojima K (1960) The evolutionary
tions. Nat Genet 17:435–438 dynamics of complex polymorphisms. Evolu-
Laan M, Wiebe V, Khusnutdinova E, Remm M, tion 14:458–472
Paabo S (2005) X-chromosome as a marker Li N, Stephens M (2003) Modeling linkage dis-
for population history: Linkage disequilibri- equilibrium and identifying recombination
um and haplotype study in Eurasian popula- hotspots using single-nucleotide polymor-
tions. Eur J Hum Genet 13:452–462 phism data. Genetics 165:2213–2233
Lahermo P, Sajantila A, Sistonen P, Lukka M, Mahadevan M, Tsilfidis C, Sabourin L, Shutler
Aula P, Peltonen L, Savontaus ML (1996) G, Amemiya C, Jansen G, Neville C, Narang
The genetic relationship between the Finns M, Barcelo J, O’Hoy K (1992) Myotonic dys-
and the Finnish Saami (Lapps): Analysis of trophy mutation: An unstable CTG repeat in
nuclear DNA and mtDNA. Am J Hum Genet the 3’ untranslated region of the gene. Science
58:1309–1322 255:1253–1255
Lahermo P, Savontaus ML, Sistonen P, Beres J, May CA, Shone AC, Kalaydjieva L, Sajantila A,
de Knijff P, Aula P, Sajantila A (1999) Y chro- Jeffreys AJ (2002) Crossover clustering and
mosomal polymorphisms reveal founding lin- rapid decay of linkage disequilibrium in the
eages in the Finns and the Saami. Eur J Hum Xp/Yp pseudoautosomal gene SHOX. Nat
Genet 7:447–458 Genet 31:272–275


Nikun06.indd 47 24.9.2008 16:41:13
McVean G, Spencer CC, Chaix R (2005) Perspec- Nielsen R, Hellmann I, Hubisz M, Bustamante
tives on human genetic variation from the Hap- C, Clark AG (2007) Recent and ongoing se-
Map project. PLoS Genet 1:e54 lection in the human genome. Nat Rev Genet
Meinila M, Finnila S, Majamaa K (2001) Evi- 8:857–868
dence for mtDNA admixture between the finns Nielsen R, Williamson S, Kim Y, Hubisz MJ,
and the saami. Hum Hered 52:160–170 Clark AG, Bustamante C (2005) Genomic
Metspalu M, Kivisild T, Metspalu E, Parik J, scans for selective sweeps using SNP data.
Hudjashov G, Kaldma K, Serk P, Karmin M, Genome Res 15:1566–1575
Behar DM, Gilbert MT, Endicott P, Mastana Niu CY, Luo JY, Hao ZM (2004) Genetic poly-
S, Papiha SS, Skorecki K, Torroni A, Villems morphism analysis of cytochrome P4502C19
R (2004) Most of the extant mtDNA bound- in Chinese Uigur and Han populations. Chin J
aries in South and Southwest Asia were likely Dig Dis 5:76–80
shaped during the initial settlement of Eurasia Nordqvist B (2000) Coastal adaptations in the
by anatomically modern humans. BMC Gen- mesolitic. A study of costal sites with organic
et 5:26 remains from the Boreal and Atlantic periods
Michalatos-Beloin S, Tishkoff SA, Bentley KL, in Western Sweden. GOTARC Series B 13
Kidd KK, Ruano G (1996) Molecular haplo- Norio R (2003a) Finnish disease heritage I: Char-
typing of genetic markers 10 kb apart by al- acteristics, causes, background. Hum Genet
lele-specific long-range PCR. Nucleic Acids 112:441–456
Res 24:4841–4843 Norio R (2003b) Finnish disease heritage II: Pop-
Mir KU, Southern EM (2000) Sequence varia- ulation prehistory and genetic roots of Finns.
tion in genes and genomic DNA: Methods Hum Genet 112:457–469
for large-scale analysis. Annu Rev Genomics Norio R (2003c) The Finnish disease heritage III:
Hum Genet 1:329–360 The individual diseases. Hum Genet 112:470–
Mueller JC, Lohmussaar E, Magi R, Remm M, 526
Bettecken T, Lichtner P, Biskup S, Illig T, Ohta T (2002) Near-neutrality in evolution of
Pfeufer A, Luedemann J, Schreiber S, Pram- genes and gene regulation. Proc Natl Acad Sci
staller P, Pichler I, Romeo G, Gaddi A, Testa U S A 99:16134–16137
A, Wichmann HE, Metspalu A, Meitinger T Ota T, Kimura M (1973) A model of mutation
(2005) Linkage disequilibrium patterns and appropriate to estimate the number of electro-
tagSNP transferability among European pop- phoretically detectable alleles in a finite popu-
ulations. Am J Hum Genet 76:387–398 lation. Genet Res 22:201–204
Myers S, Bottolo L, Freeman C, McVean G, Don- Paabo S (1989) Ancient DNA: Extraction, char-
nelly P (2005) A fine-scale map of recombina- acterization, molecular cloning, and enzymat-
tion rates and hotspots across the human ge- ic amplification. Proc Natl Acad Sci U S A
nome. Science 310:321–324 86:1939–1943
Myles S, Bouzekri N, Haverfield E, Cherkaoui Palo JU, Hedman M, Ulmanen I, Lukka M, Sa-
M, Dugoujon JM, Ward R (2005) Genetic evi- jantila A (2007) High degree of Y-chromosom-
dence in support of a shared Eurasian-North al divergence within Finland – forensic aspects.
African dairying origin. Hum Genet 117:34– Forensic Sci Int Genet 1:120–124
42 Patil N, Berno AJ, Hinds DA, Barrett WA, Doshi
Nachman MW (2001) Single nucleotide poly- JM, Hacker CR, Kautzer CR, Lee DH, Marjor-
morphisms and recombination rate in humans. ibanks C, McDonough DP, Nguyen BT, Norris
Trends Genet 17:481–485 MC, Sheehan JB, Shen N, Stern D, Stokowski
Nachman MW, Crowell SL (2000) Estimate of RP, Thomas DJ, Trulson MO, Vyas KR, Fraz-
the mutation rate per nucleotide in humans. er KA, Fodor SP, Cox DR (2001) Blocks of
Genetics 156:297–304 limited haplotype diversity revealed by high-
Nevanlinna HR (1972) The Finnish population resolution scanning of human chromosome 21.
structure. A genetic and genealogical study. Science 294:1719–1723
Hereditas 71:195–236 Pavlov P, Svendsen JI, Indrelid S (2001) Human
Nevanlinna HR (1984) The roots of the Finns in presence in the European Arctic nearly 40,000
the light of genetic research into distinguish- years ago. Nature 413:64–67
ing genetic characteristics. Soc Sci Fenni- Peltonen L, Palotie A, Lange K (2000) Use of
ca:157–174 population isolates for mapping complex
traits. Nat Rev Genet 1:182–190


Nikun06.indd 48 24.9.2008 16:41:13
Perheentupa J (1995) The Finnish disease heri- Richards M, Macaulay V, Hickey E, Vega E,
tage: A personal look. Acta Paediatr 84:1094– Sykes B, Guida V, Rengo C et al. (2000) Trac-
1099 ing european founder lineages in the near east-
Phillips MS, Lawrence R, Sachidanandam R, ern mtDNA pool. Am J Hum Genet 67:1251–
Morris AP, Balding DJ, Donaldson MA, 1276
Studebaker JF et al. (2003) Chromosome- Romualdi C, Balding D, Nasidze IS, Risch G, Ro-
wide distribution of haplotype blocks and the bichaux M, Sherry ST, Stoneking M, Batzer
role of recombination hot spots. Nat Genet MA, Barbujani G (2002) Patterns of human
33:382–387 diversity, within and among continents, in-
Pimenoff V, Korpisaari A (2004) A preliminary ferred from biallelic DNA polymorphisms.
report on the genetic analysis of the osteologi- Genome Res 12:602–612
cal remains of Tiwanaku tombs in Tiraska and Rootsi S, Zhivotovsky LA, Baldovic M, Kayser
Aymara chullpa C17 in kewaya. Report sub- M, Kutuev IA, Khusainova R, Bermisheva
mitted to DINAAR (National Directorate of MA, Gubina M, Fedorova SA, Ilumae AM,
Archaeology and Anthropology, Bolivia). Khusnutdinova EK, Voevoda MI, Osipova
Pimenoff V, Sajantila A (2002) Genetic Tools in LP, Stoneking M, Lin AA, Ferak V, Parik J,
Molecular Anthropology and Archaeology. Kivisild T, Underhill PA, Villems R (2007)
The Roots of Peoples and Languages of the A counter-clockwise northern route of the
Northern Eurasia IV Ed. K. Julku, Gummerus Y-chromosome haplogroup N from south-
OY. Jyväskylä, Finland. east asia towards europe. Eur J Hum Genet
Pritchard JK, Stephens M, Donnelly P (2000) In- 15:204–211
ference of population structure using multilo- Rosenberg NA, Pritchard JK, Weber JL, Cann
cus genotype data. Genetics 155:945–959 HM, Kidd KK, Zhivotovsky LA, Feldman
Pult I, Sajantila A, Simanainen J, Georgiev O, MW (2002) Genetic structure of human popu-
Schaffner W, Paabo S (1994) Mitochondrial lations. Science 298:2381–2385
DNA sequences from switzerland reveal strik- Ross AB, Johansson A, Ingman M, Gyllensten U
ing homogeneity of european populations. Biol (2006) Lifestyle, genetics, and disease in sami.
Chem Hoppe Seyler 375:837–840 Croat Med J 47:553–565
Quintana-Murci L, Chaix R, Wells RS, Behar Rosser ZH, Zerjal T, Hurles ME, Adojaan M, Ala-
DM, Sayar H, Scozzari R, Rengo C, Al-Zah- vantic D, Amorim A, Amos W et al. (2000) Y-
ery N, Semino O, Santachiara-Benerecetti AS, chromosomal diversity in europe is clinal and
Coppa A, Ayub Q, Mohyuddin A, Tyler-Smith influenced primarily by geography, rather
C, Qasim Mehdi S, Torroni A, McElreavey K than by language. Am J Hum Genet 67:1526–
(2004) Where west meets east: The complex 1543
mtDNA landscape of the southwest and cen- Ruano G, Kidd KK, Stephens JC (1990) Haplo-
tral asian corridor. Am J Hum Genet 74:827– type of multiple polymorphisms resolved by
845 enzymatic amplification of single DNA mol-
Raitio M, Lindroos K, Laukkanen M, Pastinen ecules. Proc Natl Acad Sci U S A 87:6296–
T, Sistonen P, Sajantila A, Syvanen AC (2001) 6300
Y-chromosomal SNPs in finno-ugric-speaking Sabeti PC, Schaffner SF, Fry B, Lohmueller J,
populations analyzed by minisequencing on Varilly P, Shamovsky O, Palma A, Mikkelsen
microarrays. Genome Res 11:471–482 TS, Altshuler D, Lander ES (2006) Positive
Redon R, Ishikawa S, Fitch KR, Feuk L, Per- natural selection in the human lineage. Sci-
ry GH, Andrews TD, Fiegler H et al. (2006) ence 312:1614–1620
Global variation in copy number in the human Sabeti PC, Varilly P, Fry B, Lohmueller J,
genome. Nature 444:444–454 Hostetter E, Cotsapas C, Xie X et al. (2007)
Reich DE, Cargill M, Bolk S, Ireland J, Sabe- Genome-wide detection and characterization
ti PC, Richter DJ, Lavery T, Kouyoumjian of positive selection in human populations.
R, Farhadian SF, Ward R, Lander ES (2001) Nature 449:913–918
Linkage disequilibrium in the human genome. Sahi T, Isokoski M, Jussila J, Launiala K, Pyorala
Nature 411:199–204 K (1973) Recessive inheritance of adult-type
Reich DE, Lander ES (2001) On the allelic spec- lactose malabsorption. Lancet 2:823–826
trum of human disease. Trends Genet 17:502– Saillard J, Forster P, Lynnerup N, Bandelt HJ,
510 Norby S (2000) mtDNA variation among
greenland eskimos: The edge of the beringian
expansion. Am J Hum Genet 67:718–726


Nikun06.indd 49 24.9.2008 16:41:13
Sajantila A, Lahermo P, Anttinen T, Lukka M, Si- Syvanen AC (2001) Accessing genetic variation:
stonen P, Savontaus ML, Aula P, Beckman L, Genotyping single nucleotide polymorphisms.
Tranebjaerg L, Gedde-Dahl T, Issel-Tarver L, Nat Rev Genet 2:930–942
DiRienzo A, Paabo S (1995) Genes and lan- Tambets K, Rootsi S, Kivisild T, Help H, Serk P,
guages in europe: An analysis of mitochondri- Loogvali EL, Tolk HV et al. (2004) The west-
al lineages. Genome Res 5:42–52 ern and eastern roots of the saami--the story
Sajantila A, Lukka M, Syvanen AC (1999) Exper- of genetic “outliers” told by mitochondrial
imentally observed germline mutations at hu- DNA and Y chromosomes. Am J Hum Genet
man micro- and minisatellite loci. Eur J Hum 74:661–682
Genet 7:263–266 Terwilliger JD, Hiekkalinna T (2006) An utter
Sajantila A, Salem AH, Savolainen P, Bauer K, refutation of the “fundamental theorem of the
Gierig C, Paabo S (1996) Paternal and mater- HapMap”. Eur J Hum Genet 14:426–437
nal DNA lineages reveal a bottleneck in the Terwilliger JD, Zollner S, Laan M, Paabo S
founding of the finnish population. Proc Natl (1998) Mapping genes through the use of
Acad Sci U S A 93:12035–12039 linkage disequilibrium generated by genet-
Sanger F, Nicklen S, Coulson AR (1977) DNA ic drift: ‘drift mapping’ in small populations
sequencing with chain-terminating inhibitors. with no demographic expansion. Hum Hered
Proc Natl Acad Sci U S A 74:5463–5467 48:138–154
Semino O, Passarino G, Oefner PJ, Lin AA, Ar- Tishkoff SA, Dietzsch E, Speed W, Pakstis AJ,
buzova S, Beckman LE, De Benedictis G, Kidd JR, Cheung K, Bonne-Tamir B, San-
Francalacci P, Kouvatsi A, Limborska S, Mar- tachiara-Benerecetti AS, Moral P, Krings M
cikiae M, Mika A, Mika B, Primorac D, San- (1996) Global patterns of linkage disequilib-
tachiara-Benerecetti AS, Cavalli-Sforza LL, rium at the CD4 locus and modern human ori-
Underhill PA (2000) The genetic legacy of pa- gins. Science 271:1380–1387
leolithic homo sapiens sapiens in extant euro- Tishkoff SA, Kidd KK (2004) Implications of
peans: A Y chromosome perspective. Science biogeography of human populations for ‘race’
290:1155–1159 and medicine. Nat Genet 36:S21–7
Service S, DeYoung J, Karayiorgou M, Roos JL, Tishkoff SA, Reed FA, Ranciaro A, Voight
Pretorious H, Bedoya G, Ospina J et al. (2006) BF, Babbitt CC, Silverman JS, Powell K,
Magnitude and distribution of linkage disequi- Mortensen HM, Hirbo JB, Osman M, Ibrahim
librium in population isolates and implications M, Omar SA, Lema G, Nyambo TB, Ghori J,
for genome-wide association studies. Nat Bumpstead S, Pritchard JK, Wray GA, Delou-
Genet 38:556–560 kas P (2007) Convergent adaptation of human
Slatkin M (2008) Linkage disequilibrium--under- lactase persistence in africa and europe. Nat
standing the evolutionary past and mapping the Genet 39:31–40
medical future. Nat Rev Genet 9:477–485 Tishkoff SA, Verrelli BC (2003) Role of evolu-
Slatkin M, Excoffier L (1996) Testing for link- tionary history on haplotype block structure in
age disequilibrium in genotypic data using the the human genome: Implications for disease
expectation-maximization algorithm. Heredity mapping. Curr Opin Genet Dev 13:569–575
76:377–383 Torroni A, Achilli A, Macaulay V, Richards M,
Smith JM, Haigh J (1974) The hitch-hiking effect Bandelt HJ (2006) Harvesting the fruit of the
of a favourable gene. Genet Res 23:23–35 human mtDNA tree. Trends Genet 22:339–
Stephens M, Smith NJ, Donnelly P (2001) A new 345
statistical method for haplotype reconstruc- Torroni A, Bandelt HJ, D’Urbano L, Lahermo P,
tion from population data. Am J Hum Genet Moral P, Sellitto D, Rengo C, Forster P, Savon-
68:978–989 taus ML, Bonne-Tamir B, Scozzari R (1998)
Stumpf MP, Goldstein DB (2003) Demography, mtDNA analysis reveals a major late paleo-
recombination hotspot intensity, and the block lithic population expansion from southwest-
structure of linkage disequilibrium. Curr Biol ern to northeastern europe. Am J Hum Genet
13:1–8 62:1137–1152
Swallow DM (2003) Genetics of lactase persis- Torroni A, Bandelt HJ, Macaulay V, Richards M,
tence and lactose intolerance. Annu Rev Genet Cruciani F, Rengo C, Martinez-Cabrera V et
37:197–219 al. (2001) A signal, from human mtDNA, of
postglacial recolonization in europe. Am J
Hum Genet 69:844–852


Nikun06.indd 50 24.9.2008 16:41:14
Underhill PA, Shen P, Lin AA, Jin L, Passarino G, Wang N, Akey JM, Zhang K, Chakraborty R, Jin
Yang WH, Kauffman E, Bonne-Tamir B, Ber- L (2002) Distribution of recombination cross-
tranpetit J, Francalacci P, Ibrahim M, Jenkins overs and the origin of haplotype blocks: The
T, Kidd JR, Mehdi SQ, Seielstad MT, Wells interplay of population history, recombina-
RS, Piazza A, Davis RW, Feldman MW, Caval- tion, and mutation. Am J Hum Genet 71:1227–
li-Sforza LL, Oefner PJ (2000) Y chromosome 1234
sequence variation and the history of human Waples RS, Gaggiotti O (2006) What is a popula-
populations. Nat Genet 26:358–361 tion? an empirical evaluation of some genetic
Varilo T, Laan M, Hovatta I, Wiebe V, Terwilliger methods for identifying the number of gene
JD, Peltonen L (2000) Linkage disequilibrium pools and their degree of connectivity. Mol
in isolated populations: Finland and a young Ecol 15:1419–1439
sub-population of kuusamo. Eur J Hum Genet Watkins WS, Rogers AR, Ostler CT, Wooding S,
8:604–612 Bamshad MJ, Brassington AM, Carroll ML,
Varilo T, Paunio T, Parker A, Perola M, Meyer J, Nguyen SV, Walker JA, Prasad BV, Reddy PG,
Terwilliger JD, Peltonen L (2003) The inter- Das PK, Batzer MA, Jorde LB (2003) Genet-
val of linkage disequilibrium (LD) detected ic variation among world populations: Infer-
with microsatellite and SNP markers in chro- ences from 100 alu insertion polymorphisms.
mosomes of finnish populations with different Genome Res 13:1607–1618
histories. Hum Mol Genet 12:51–59 Weber JL (1990) Human DNA polymorphisms
Varilo T, Savukoski M, Norio R, Santavuori P, and methods of analysis. Curr Opin Biotech-
Peltonen L, Jarvela I (1996) The age of hu- nol 1:166–171
man mutation: Genealogical and linkage dis- Wedlund PJ (2000) The CYP2C19 enzyme poly-
equilibrium analysis of the CLN5 mutation morphism. Pharmacology 61:174–183
in the finnish population. Am J Hum Genet Weiss KM, Clark AG (2002) Linkage disequi-
58:506–512 librium and the mapping of complex human
Vasil’ev SA, Kuzmin YV, Orlova LA, Dementiev traits. Trends Genet 18:19–24
VN (2002) Radiocarbon-based chronology of Wells RS, Yuldasheva N, Ruzibakiev R, Under-
the paleolithic of siberia and its relevance to hill PA, Evseeva I, Blue-Smith J, Jin L et al.
the peopling of the new world. Radiocarbon (2001) The eurasian heartland: A continental
44(2):503–530 perspective on Y-chromosome diversity. Proc
Venter JC, Adams MD, Myers EW, Li PW, Mu- Natl Acad Sci U S A 98:10244–10249
ral RJ, Sutton GG, Smith HO et al. (2001) Wilkinson GR (2005) Drug metabolism and vari-
The sequence of the human genome. Science ability among patients in drug response. N
291:1304–1351 Engl J Med 352:2211–2221
Vilkki J, Savontaus ML, Nikoskelainen EK Williamson SH, Hubisz MJ, Clark AG, Payseur
(1988) Human mitochondrial DNA types in BA, Bustamante CD, Nielsen R (2007) Local-
finland. Hum Genet 80:317–321 izing recent adaptive evolution in the human
Vormfelde SV, Schirmer M, Toliat MR, Meineke genome. PLoS Genet 3:e90
I, Kirchheiner J, Nurnberg P, Brockmoller J Wright S (1931) Evolution in Mendelian popula-
(2007) Genetic variation at the CYP2C locus tions. Genetics 16:97–159
and its association with torsemide biotransfor- Xiao ZS, Goldstein JA, Xie HG, Blaisdell J,
mation. Pharmacogenomics J 7:200–211 Wang W, Jiang CH, Yan FX, He N, Huang SL,
Wall JD, Pritchard JK (2003) Haplotype blocks Xu ZH, Zhou HH (1997) Differences in the in-
and linkage disequilibrium in the human ge- cidence of the CYP2C19 polymorphism affect-
nome. Nat Rev Genet 4:587–597 ing the S-mephenytoin phenotype in chinese
Walton R, Kimber M, Rockett K, Trafford C, han and bai populations and identification of
Kwiatkowski D, Sirugo G (2005) Haplo- a new rare CYP2C19 mutant allele. J Pharma-
type block structure of the cytochrome P450 col Exp Ther 281:604–609
CYP2C gene cluster on chromosome 10. Nat Xie HG, Stein CM, Kim RB, Wilkinson GR,
Genet 37:915–6; author reply 916 Flockhart DA, Wood AJ (1999) Allelic, ge-
Wang ET, Kodama G, Baldi P, Moyzis RK (2006) notypic and phenotypic distributions of S-
Global landscape of recent inferred darwinian mephenytoin 4’-hydroxylase (CYP2C19) in
selection for homo sapiens. Proc Natl Acad healthy caucasian populations of european de-
Sci U S A 103:135–140 scent throughout the world. Pharmacogenetics


Nikun06.indd 51 24.9.2008 16:41:14
Xu X, Peng M, Fang Z (2000) The direction of Zerjal T, Dashnyam B, Pandya A, Kayser M,
microsatellite mutations is dependent upon al- Roewer L, Santos FR, Schiefenhovel W,
lele length. Nat Genet 24:396–399 Fretwell N, Jobling MA, Harihara S, Shi-
Y Chromosome Consortium (2002) A nomen- mizu K, Semjidmaa D, Sajantila A, Salo P,
clature system for the tree of human Y-chro- Crawford MH, Ginter EK, Evgrafov OV, Ty-
mosomal binary haplogroups. Genome Res ler-Smith C (1997) Genetic relationships of
12:339–348 asians and northern europeans, revealed by Y-
Zeggini E, Barton A, Eyre S, Ward D, Ollier W, chromosomal DNA analysis. Am J Hum Genet
Worthington J, John S (2005) Characterisation 60:1174–1183
of the genomic architecture of human chromo- Zerjal T, Wells RS, Yuldasheva N, Ruzibakiev
some 17q and evaluation of different methods R, Tyler-Smith C (2002) A genetic landscape
for haplotype block definition. BMC Genet reshaped by recent events: Y-chromosomal
6:21 insights into central asia. Am J Hum Genet
Zerjal T, Beckman L, Beckman G, Mikelsaar AV, 71:466–482
Krumina A, Kucinskas V, Hurles ME, Tyler- Zhang W, Collins A, Maniatis N, Tapper W, Mor-
Smith C (2001) Geographical, linguistic, and ton NE (2002) Properties of linkage disequi-
cultural influences on genetic diversity: Y- librium (LD) maps. Proc Natl Acad Sci U S A
chromosomal distribution in northern europe- 99:17004–17007
an populations. Mol Biol Evol 18:1077–1087 Zhivotovsky LA, Rosenberg NA, Feldman MW
(2003) Features of evolution and expansion of
modern humans, inferred from genomewide
microsatellite markers. Am J Hum Genet


Nikun06.indd 52 24.9.2008 16:41:14