Times of India All Indians have the same genes

Jan 1, 2012, 07.26AM IST

Kumarasamy Thangaraj, a world famous Scientist onHuman Genetics, speaks on the history of Indian population. Excerpts:

Q. The study of Indian ancestry that you did along with the former CCMB director Lalji Singh and two US researchers was published in Nature in September 2009. It is said to have rewritten the history of Indian population... It established through genomic analyses that people in north India were no different from those in the south and that all shared the same genetic lineage . It also established that people of north and south were part of the same culture. We analyzed over 500,000 genetic markers across diverse groups, including the traditional "upper" /" lower" castes and tribal groups and proved that there was no difference between tribal populations and castes, and it was impossible to make a distinction between them. Q. Is the study being taken forward now? What are the findings that will come out in 2012? We are working at breakneck speed. The study done so far was the background work and we will be going public with findings in 2012 that will be futuristic with a more immediate, practical connect with the healthcare sector and people in general. We are at a stage where we can discuss the basis of getting a particular disease. It is the genetic difference that makes one vulnerable or resistant to a disease such as malaria or even HIV. The finding may not yield a cure but prevention such as a pre-natal diagnostic test would be able to determine a child's vulnerability to certain diseases. Genome-based medicine is a possibility in the future. It would also help doctors decide if certain drugs should be used on individuals. Q. Would it improve our understanding of diseases?

The findings in 2012 would enable an understanding of the genetic factor associated with diseases, particularly those that are seen only in India. Our studies have found population-specific , region-specific and Indiaspecific medical conditions. Madras motor neuron disease is one, named so because it's found in the Chittoor-Tirupati-Chennai corridor. Or the entire short-statured with bone abnormality population of Handikudu in Karnataka is another example of a regionspecific medical condition. Our findings in 2012 will explore whether Europeans contributed to the Ancient North Indian or vice versa. We will come up with a definitive answer to this question. We will also find out how nature affects the genome, the changes that undergo in the genetic makeup of a population set to adjust to climatic conditions. Q. Is there any particular medical condition better understood now with the help of this study? Cardiomyopathy is one. It's a heart muscle disease found rooted in a genetic mutation that exists in 4.5 % of the Indian population. We found it in Pakistan, Sri Lanka, Indonesia and Malaysia. We didn't find it elsewhere in the world. We traced the mutation to 33,000 years ago in India. Since the Indian population remained isolated (few marriages with other nations), the disease exists in the Indian subcontinent and South Asia. And this finding has an inter-linked revelation : 50% South Asians have Indian ancestry.

New research debunks Aryan invasion theory
Published: Saturday, Dec 10, 2011, 10:30 IST By Kumar Chellappan | Place: Chennai | Agency: DNA

In what could be a major setback to Dravidian parties in Tamil Nadu, an intercontinental research in cellular molecular biology has debunked the Aryan invasion theory. ―We have conclusively proved that there never existed any Aryans or Dravidians in the Indian sub continent. The Aryan-Dravidian classification was nothing but a

misinformation campaign carried out by people with vested interests,‖ Prof Lalji Singh, vice-chancellor, Banaras Hindu University, told DNA. The findings of a three-year research by a team of scientists, including Prof Singh and others from various countries, has been published by American Journal of Human Genetics in its issue dated December 9. ―The study effectively puts to rest the argument that south Indians are Dravidians and were driven to the peninsula by Aryans who invaded North India,‖ said Prof Singh, a molecular biologist and former chief of Centre for Cellular and Molecular Biology, Hyderabad. According to Dr Gyaneshwer Chaubey, Estonian Biocentre, Tartu, Estonia, who was another Indian member of the team, the leaders of Dravidian political parties may have to find another answer for their raison d'être. ―We have proved that people all over India have common genetic traits and origin. All Indians have the same DNA structure. No foreign genes or DNA has entered the Indian mainstream in the last 60,000 years,‖ Dr Chaubey said. Dr Chaubey had proved in 2009 itself that the Aryan invasion theory is bunkum. ―That was based on low resolution genetic markers. This time we have used autosomes, which means all major 23 chromosomes, for our studies. The decoding of human genome and other advances in this area help us in unraveling the ancestry in 60,000 years,‖ he explained. However, Gnani Shankaran, noted Dravidian thinker, said the time for writing the last word on Dravidian philosophy has not yet come. ―We have to find out the credentials of the authors of this research paper and their hidden agenda. In Tamil Nadu, the Dravidian and Aryan ties are inter-related. The Dalits in our land are the descendents of the Dravidian Brahmins who were pushed to the lowest strata of society by the Aryans,‖ Shankaran said. According to Prof Singh, Dr Chaubey, and Dr Kumarasamy Thangaraj, another member of the team, the findings disprove the caste theory prevailing in India. Interestingly, the team found that instead of Aryan invasion, it was Indians who moved from the subcontinent to Europe. ―That’s the reason behind the findings of the same genetic traits in Eurasiain regions,‖ said Dr Thangaraj, senior scientist, CCMB. ―Africans came to India through Central Asia during 80,000 to 60,000 BCE and they moved to Europe sometime around 30,000 BCE. The Indian Vedic literature and the epics are all silent about the Aryan-Dravidian conflict,‖ said Dr S

Kalyanaraman, a proponent of the Saraswathi civilization which developed along the banks of the now defunct River Saraswathi. Genetic study finds no evidence for Aryan Migration Theory--On the contrary, South Indians migrated to north and South Asians migrated into Eurasia What geneticists consider a landmark paper has just been published in a highly reputed scientific journal, American Journal of Human Genetics, authored by an international group of geneticists including Metspalu, Gyaneshwer Chaubey, Chandana Basu Mallick (Evolutionary Biology Group in Tartu, Estonia), Ramasamy Pitchappan (Chettinad Academy of Research and Education, Chennai), Lalji Singh, and Kumarasamy Thangaraj (CCMB, Hyderabad). The study is titled: Shared and Unique Components of Human Population Structure and GenomeWide Signals of Positive Selection in South Asia, The American Journal of Human Genetics (2011), doi:10.1016/j.ajhg.2011.11.010 The study is comprehensive, unlike previous studies of human genome and is unique, because it focuses on large number of populations in South Asia, and India, a region which harbours one of the highest levels of genetic diversity in Eurasia and currently accounts for one sixth of human population in the world. The study analysed human genetic variation on a sample of 1310 individuals that belong to 112 populations, using new genome-wide data contains more than 600,000 single nucleotide polymorphic sites among 142 samples from 30 ethnic groups of India. The most important scientific findings of the study are: • South Asian genetic diversity is 2nd in the world, next only to Africa, mainly due to long periods of indigenous development of lineages and with complex population structure where one can see the different caste and tribal populations. • Two genetic components among Indians are observed: one is restricted to India and explains 50% genetic ancestry of Indian populations , while, the second which spread to West Asia and Caucasus region. Technically called ―haplotype diversity‖, it is a measure of the origin of the genetic component. The component which spread beyond India has significantly higher haplotype diversity in India

• Haplotype diversity associated with dark green ancestry is greatest in the south of the Indian subcontinent.500 years ago a dramatic migration of IndoEuropean speakers from Central Asia shaping contemporary South Asian populations. On the contrary. The present study notes that any migration from Central Asia to South Asia should have introduced readily apparent signals of East Asian ancestry into India. PPARA – implicated in lipid metabolism and etiology of type 2 diabetes. The study. Iran. MSTN. indigenous evolution of people). • A remarkable finding is that the origin of these components in India is much older than 3500 years which clearly refutes Aryan Invasion theory of the type enunciated by Max Mueller ! The study also found that haplotypic diversity of this ancestry component is much greater than in Europe and the Near East (Iraq.500 years ago. The distribution of two genetic components among Indians clearly indicates that the AryanDravidian division is a myth. . it should have occurred 12. concludes that if such at all such a dispersal ever took place.than in any other part of world. both of which have been linked to recent rapid urbanization. Middle East) thus pointing to an older age of the component and/or long-term higher effective population size (that is. therefore. The study points to a possible genetic reasons and recommends further researches on four genes – DOKS. • India has one of the world’s fastest growing incidence of type 2 diabetes as well as a sizeable number of cases of the metabolic syndrome. 3. there is evidence for East Asian ancestry component reaching Central Asia at a later period. The study finds that this ancestry component is absent from the region. A few past studies on mtDNA and Y-chromosome variation have interpreted their results in favor of the hypothesis. Indian population landscape is clearly governed by geography. • The study refutes Aryan migrations into India suggested by the German orientalist Max Muller that ca. indicating that the alleles underlying it most likely arose there and spread northwards. CLOCK. This is clear proof that this genetic component originated in India and then spread to West Asia and Caucasus. whereas others have found no genetic evidence to support it. introduction of the Indo-European language family and the caste system in India.

Estonia 6 Department of Genetic Medicine and Development. and the Department of Genetics and Fundamental Medicine. Estonian Biocentre. Russia 5 Department of Biotechnology. Gyaneshwer Chaubey1. Georgi Hudjashov1. Cambridge CB2 1QH. Estonia 2 Department of Evolutionary Biology. 51010 Tartu. University of Geneva Medical School. Russian Academy of Sciences. 13. 8. 51010 Tartu. Institute of Molecular and Cell Biology. 11. UK 4 Institute of Biochemistry and Genetics. Wellcome Trust Centre for Human Genetics. Bayazit Yunusbayev1. 13. University of Cambridge. Kumarasamy Thangaraj10. ari Nelis5. Richard Villems1. Reedik Mägi7. . 12 nd Toomas Kivisild1. University of Oxford. 51010 Tartu. Maido Remm7. Institute of Molecular and Cell Biology. University of Tartu. 14. Ene Metspalu2. 2.Kalyanaraman Dec. UK 9 Chettinad Academy of Research and Education. 9. Estonia 8 Genetic and Genomic Epidemiology Unit. Bashkir tate University. University of Tartu and Estonian Biocentre. 6. 2. 2. . 4. 450054 Ufa. University of Tartu. 51010 Tartu. 3. 1211 Geneva. Chandana Basu Mallick1. Oxford OX3 7BN. Chettinad Health City. Chennai 603 103. Ramasamy itchappan9. 2. 2011 Shared and Unique Components of Human Population Structure and GenomeWide Signals of Positive Selection in South Asia Mait Metspalu1. Ufa Research Center. India . Switzerland 7 Department of Bioinformatics. Irene Gallego Romero3. Lalji Singh10. 3 1 Evolutionary Biology Group. 2. Institute of Molecular and Cell Biology. Estonia 3 Department of Biological Anthropology.

a higher proportion of their genes show regionally specific signals of high haplotype homozygosity. both of which have potential implications in lipid metabolism and the etiology of type 2 diabetes. Here we report data for more than 600. compared to Pakistani populations. Consistent with the results of pairwise genetic distances among world regions. Chicago.000 SNP markers genotyped in 142 samples from 30 ethnic groups in India. Haplotype diversity associated with these South Asian ancestry components is significantly higher than that of the components dominating the West Eurasian ancestry palette. which could be interpreted as a result of its long-term large effective population size and of admixture during its complex demographic history. CLSC 317. populations of Indian origin have been underrepresented in previous genomic scans of positive selection and population structure. Hyderabad 500 007.10 Centre for Cellular and Molecular Biology. India 11 Banaras Hindu University. USA Abstract South Asia harbors one of the highest levels genetic diversity in Eurasia.Varanasi 221 005. Modeling of the observed haplotype diversities suggests that both Indian ancestry components are older than the purported IndoAryan invasion 3. 920 E 58th Street. In contrast to Pakistani populations. we show that Indian populations are characterized by two major ancestry components. India 12 Estonian Academy of Sciences. Estonia Corresponding author 13 These authors contributed equally to this work 14 Present address: Department of Human Genetics. IL 60637. The second component is more restricted to South Asia and accounts for more than 50% of the ancestry in Indian populations. University of Chicago. Among such candidates of positive selection in India are MSTN and DOK5. .500 YBP. one of which is spread at comparable frequency and haplotype diversity in populations of South and West Asia and the Caucasus. Tallinn. Indians share more ancestry signals with West than with East Eurasians. However. Combining our results with other available genome-wide data.

Tallinn. 1211 Geneva. University of Geneva Medical School. Bayazit Yunusbayev1. UK Institute of Biochemistry and Genetics. Estonia 3 4 Department of Biological Anthropology. 450054 Ufa. 51010 Tartu. 12 and Toomas Kivisild1. 51010 Tartu. Cambridge CB2 1QH. Wellcome Trust Centre for Human Genetics. University of Cambridge. 2. 8. University of Oxford. 2. University of Tartu. 2. Estonia 8 Genetic and Genomic Epidemiology Unit.Shared and Unique Components of Human Population Structure and Genome-Wide Signals of Positive Selection in South Asia Mait Metspalu1. Lalji Singh10. Chennai 603 103. Reedik Mägi7. Estonia Department of Evolutionary Biology. 2. India Banaras Hindu University. Hyderabad 500 007. 6. University of Tartu and Estonian Biocentre. 13. Switzerland 7 Department of Bioinformatics. Institute of Molecular and Cell Biology. Ene Metspalu2. 2. Russia 5 Department of Biotechnology. 13.Varanasi 221 005. Estonia 6 Department of Genetic Medicine and Development. India 10 11 12 Centre for Cellular and Molecular Biology. . 13. 11. and the Department of Genetics and Fundamental Medicine. University of Tartu. 51010 Tartu. . Estonia Corresponding author 13 These authors contributed equally to this work . Bashkir State University. India Estonian Academy of Sciences. Russian Academy of Sciences. Ufa Research Center. Institute of Molecular and Cell Biology. 51010 Tartu. Irene Gallego Romero3. Richard Villems1. 4. Chettinad Health City. Georgi Hudjashov1. Chandana Basu Mallick1. Ramasamy Pitchappan9. UK 9 Chettinad Academy of Research and Education. 14. Estonian Biocentre. 3 1 2 Evolutionary Biology Group. Oxford OX3 7BN. Gyaneshwer Chaubey1. Maido Remm7. Institute of Molecular and Cell Biology. Kumarasamy Thangaraj10. Mari Nelis5.

Consistent with the results of pairwise genetic distances among world regions. The second component is more restricted to South Asia and accounts for more than 50% of the ancestry in Indian populations. Chicago. Modeling of the observed haplotype diversities suggests that both Indian ancestry components are older than the purported IndoAryan invasion 3. one of which is spread at comparable frequency and haplotype diversity in populations of South and West Asia and the Caucasus. Indians share more ancestry signals with West than with East Eurasians. 920 E 58th Street. Among such candidates of positive selection in India areMSTN and DOK5. However. CLSC 317. we show that Indian populations are characterized by two major ancestry components. Introduction Understanding the genetic structure of mankind globally and the role of natural selection in shaping it are complex tasks that require data from multiple populations to represent the geographic range and environmental diversity of the inhabited world. which could be interpreted as a result of its long-term large effective population size and of admixture during its complex demographic history.14 Present address: Department of Human Genetics. Here we report data for more than 600. populations of Indian origin have been underrepresented in previous genomic scans of positive selection and population structure. IL 60637. Combining our results with other available genome-wide data. compared to Pakistani populations. both of which have potential implications in lipid metabolism and the etiology of type 2 diabetes. a higher proportion of their genes show regionally specific signals of high haplotype homozygosity. Previous studies on South Asia have highlighted this region as .500 YBP. Haplotype diversity associated with these South Asian ancestry components is significantly higher than that of the components dominating the West Eurasian ancestry palette. In contrast to Pakistani populations. USA South Asia harbors one of the highest levels genetic diversity in Eurasia.000 SNP markers genotyped in 142 samples from 30 ethnic groups in India. University of Chicago.

whereas some others showed unexpectedly low diversity. Operating with a thin set of genome-wide polymorphisms.10.19 identified lower than expected levels of variation across geographically and linguistically distinct populations when sampling Indian immigrants living in the USA.12 the Human Genome Diversity Project.11Whether the genes of the crafters of these Middle Palaeolithic tools still persist among modern populations is a lingering question.having one of the highest levels of genetic diversity.13 the 1000 Genomes Project14 and the Human Genome Organisation (HUGO) Pan-Asian SNP Consortium15 have all significantly improved our understanding of the global genetic diversity of humans.4.5. these genetic dates are earlier than the oldest confirmed human fossil in the subcontinent.17. Relying on extensive resequencing rather than on genotyping panel data 1 showed that 30% of SNPs found in Indian populations were not seen in HapMap populations and that compared to these populations (including Africans) some Indian populations displayed higher levels of genetic variation. Although the HapMap.20.7 but postdate the archaeological evidence below and above the layers of ash from the Mount Toba volcanic supereruption associated with the Middle Palaeolithic tools that could have been produced by anatomically modern humans. contrary to this finding.21 Most studies highlight the elevated genetic diversity of the South Asian populations and their general clustering by language group and/or geography.000 years ago.16.6 Notably. found in Sri Lanka and dated to 31.19.12. where large genetic diversity and vast population sizes have so far gone underrepresented in genome-wide studies of human genetic diversity despite some important recent advances.18.1.000 years before present (YBP).1. shown high levels of intergroup genetic differentiation of Indian populations sampled in . India remains one such region.8.000 YBP. Others have.3.2 Studies of haploid loci (mtDNA and the nonrecombining region of Y Chromosome [NRY]) have revealed that the South Asian genetic makeup is dominated by largely autochthonous lineages testifying for low levels of admixture with other parts of Eurasia because the peopling of the subcontinent some 50.000 to 70.9 Recent archaeological evidence from the Jebel Faya site in the Arabian Peninsula permitted the authors to consider that the manufacturers of these tools could have dispersed into India as early as 125. second only to Africa. there are still significant gaps in their coverage.

18 have also made an argument for a sizeable contribution from West Eurasia to a putative ancestral north Indian (ANI) gene pool.000 SNPs in a sample set combining published data on India. Through admixture between an ancestral south Indian (ASI) gene pool. this ANI variation was found to have contributed significantly to the extant makeup of not only north (50%–70%) but also south Indian populations (>40%). represented by Pakistani populations.24. we applied FST. principal component analysis (PCA) and model-based structure-like approaches to a genome-wide sample of ca 530. . the Near East. and Central Asia.26and 142 newly genotyped Indian individuals of various linguistic. the Caucasus.18 reported higher than expected levels of homozygosity within Indian groups when examining a high density genome-wide SNP data set and attributed this pattern to population stratification born out of the endogamy associated with the caste system.India. This is in contrast with the results from mtDNA studies. Genome-wide scans on the Human genome diversity panel (HGDP) data involving 51 global populations have revealed that South Asia. the Near East.22 Because any potential genetic impact into South Asia from the west would involve at least one of the immediately adjacent regions—Central Asia.23 Given the environmental differences between Europe and Pakistan and the possible depth of human habitation in South Asia. where the percentage of West Eurasian maternal lineages is substantial (up to 50%) in Indus Valley populations but marginal (<10%) in the south of the subcontinent. geographic and social affiliations (Table S1 ). this result is surprising. Reich et al. the Caucasus.24 relevant global reference populations.12. or West Asia (including Iran)—assessment of the extent of admixture in South Asia and its sources is crippled without genetic data from those regions.18 Furthermore. and North Africa.17. shares most signals of recent positive selection with populations from Europe.5.21. but considering the lack of Indian data it remains to be determined whether South Asian-specific signals of positive selection do exist. Reich et al. To shed more light on the nature of genetic continuity and discontinuity between South and West Asia.25.

25. All subjects filled and signed personal informed consents and the study was approved by scientific council of the Estonian Biocentre.24. The overlap of SNPs between the different Illumina (610K.g. we included different number of reference populations from these sources (Table S1 ). 530. computational optimization).4) in a window of 200 SNPs (sliding the window by 25 SNPs at a time). . Only individuals with a genotyping success rate > 97% were used. We analyzed these data together with published data from12.. Because background linkage disequilibrium (LD) can affect both principal component and structure-like analysis. Depending on the analyses (e. and 660K) arrays in published and new data was ca. The data can be accessed through The National Center for Biotechnology Information -Gene Expression Omnibus (NCBI GEO) (GSE33489) and by request to the authors. We excluded published data on South Asian populations genotyped on different platforms (Affimetrix)16. 95.28 we thinned the marker set further by excluding SNPs in strong LD (pairwise genotypic correlation r 2 > 0.000 SNPs that were used for the respective analyses. and Quality Control We introduce here 142 Indian samples from 30 populations that we have genotyped with Illumina 650K SNP array according to manufacturers' specifications. Sampling locations for populations analyzed here are shown on Figure S10 (available online) together with a comparison to sampling from the previous study.18 from most of the analyses because cross-platform overlap in SNPs is limited (ca.000 SNP) for haplotype based analyses.18 We filtered the combined data sets by using PLINK software 1.18 to validate our FST and PCA results.21. 200. However. we used data from Reich et al.Material and Methods Samples.000 SNPs.000 SNPs. this yielded data sets of ca. 650K.0527 to include only SNPs on the 22 autosomal chromosomes with a minor allele frequency > 1% and genotyping success > 97%.26 on Indian and other populations used as background. Genotyping. Depending on the number of reference populations. overlap between this data set and the HapMap 3 was 480.

30Geographic distances between populations were calculated as Eucleidian distances between x and y coordinates on a Conformal Conic Asia Lambert projection.18 and the resulting cross-platform SNP panel consisted of 95. In some cases geographically close populations with a smaller sample size were grouped. We also verified that all runs within these . populations might not. genetic drift in small endogamous units. but there was no significant difference above K = 10 (Figure S4 C). necessarily.29 assembled into an in-house R script. Here too.24. resulting from restricted gene flow between populations. Given the high levels of population structure within India. be straightforward. we included here data from Reich et al. we repeated the analysis on the data set that included more Indian samples18 but fewer SNPs (Table S1 ). However. The lowest cross validation indexes that point to the best K were observed at K = 15.32 we assume that the global maximum was reached at K = 2 to K = 8. To monitor convergence between individual runs. PCA was carried out in the smartpca program28 on the Eurasian populations.Phylodemographic Analyses We calculated mean pairwise FST values between populations (and regional population groups) for all autosomal SNPs by using the approach of Weir et al. the combined data set was filtered to include only populations with n > 4. To validate our results. the interpretation of FST distances between pools of samples from different. Geographic spread of principal components ([PCs] averaged to population level) was visualized with kriging procedure in Surfer package of Golden Software. we recalculated FST values excluding population pools and setting a threshold of minimum of seven samples per population. we ran ADMIXTURE31 100 times at K = 2 to K = 18 (Figure S4 A). judging by low level of variation in Loglikelihood scores (LLs < 1) within a fraction (10%) of runs with the highest LLs. and K = 13 (Figure S4 B) rendering these practical representations of genetic structure at different levels of resolution. and our small sample sizes. To increase population coverage. although genetically closely related. K = 12. For FST calculation. Spatial autocorrelation and modified t test that estimate correlation of spatially located variables and correct for spatial autocorrelation were carried out in Passage 2.001 post quality control SNPs (Table S1 ).

counted the number of distinct haplotypes each time. 0. and we chose not to apply any multiple testing correction procedures. The haplotype diversity flanking each associated SNP was then summarized with the number of distinct haplotypes. We expected a large number of SNPs to be associated with a given ancestry component. Genomic windows of different size—0. and took the average as a summary. A summary statistic derived from the number of distinct haplotypes across genomic windows has been shown to be informative about past population demography. Instead.33 In this study. Lohmueller et al. therefore occasional false positive SNPs are negligible. and 0. and the number of distinct haplotypes within each window was counted.10% of runs at these values of K did indeed produce a very similar (indistinguishable) ancestry proportions pattern.33 considered the joint distribution of two haplotype based statistics—the number of distinct haplotypes and the count of the most common haplotype. we randomly chose ten individuals and counted the total number .26. we used regression analysis to estimate how strongly each SNP is associated with a given ancestry. We followed33 and randomly selected a subset of nSNP SNPs from each window to ensure that all windows have the same number of SNPs and that the resulting statistics are not affected by the unequal distribution of markers across the genome. Within each window we randomly sampled nSNP SNPs multiple times. 0. For each population.45. Haplotype Diversity Associated with Ancestry Informative Markers We used the individual ancestry proportion inferred by using ADMIXTURE as a quantitative trait and tested for association. we chose to filter out statistically significant regression coefficients (beta values) by using arbitrarily chosen significance threshold. 0.1.05 centiMorgans—were defined around each associated SNP. Assuming such a relationship between the genotype and trait value. Allele dosage for an SNP associated with a given ancestry is expected to increase with an increasing proportion of ancestry. In order to select only strongly associated SNPs. Here. we further filtered SNPs to retain only those exceeding 90 or 95 percentile points of positive beta-value distribution.33. we use only the number of distinct haplotypes to measure haplotype diversity.

Their number will increase each generation.200. We generated population samples by simulating admixture events between European and Asian populations as described in the next section. or 500 generations ago.of windows having 0. Assuming an average human generation interval of 25 years. nmax number of haplotypes and plotted this summary statistic by using heatmap. However. this is 7. Nucleotide substitutions arising in one population and then introduced to other populations are expected to show different levels of haplotype diversity in the source and recipient populations.000 or 12... this difference gets diluted because hybrid haplotypes arise through recombination in the recipient population. then (1) Haplotype diversity flanking Asian alleles in admixed recipient populations is lower than in source Asian populations for all the simulated admixture events except for the oldest one that occurred 750 generations ago. 1. Our simulations show that haplotype diversity flanking autosomal SNPs can be used to infer source population even when populations dispersed these alleles 288. We explored haplotype diversity flanking SNPs associated with Asian ancestry in these samples from admixed populations. The latter case confirms our expectation that immigrant alleles will be flanked with a higher number of hybrid haplotypes (those having both Asian and European ancestry blocks) with an increasing number of generations since admixture. 10. This might be because of novel hybrid haplotypes produced by the recombination process. 2. which roughly overlaps with the Neolithic period. and it is therefore important to explore how the number of generations since the migration into new population will affect our ability to detect source and recipient populations for a given mutation on the basis of haplotype diversity differences. (2) Haplotype diversity flanking European alleles in admixed populations can be comparable (for those populations having 70% of European ancestry) or even higher (for those having 90% European ancestry) than in the original European population despite the fact that admixed populations always have lower European ancestry (90% or 70%) than the original European population.500 years.. Our simulated data set shows that when European population is the recipient and Asian population the is source. 400. .

450 years. respectively: (1) Admixture 750 generations ago. and 90/10 of sequences from European and Asian populations. Testing for Selection The combined data set was filtered to include Indian populations and a comprehensive set of reference populations that yielded a data set of 990 individuals and 531. we used these demographic parameters to simulate samples of sequences drawn from African.315 autosomal SNPs (Table S1 ). 70/30. Asian. The total physical length of simulated sequences was 250 megabases.∼10.Demographic Model for Simulations We used MaCS coalescent simulator34 to generate simulated data for three nonadmixed and 18 admixed populations by modifying the demographic model originally published in.35 (5) Late Bronze Age/Iron Age admixture 138 generations. ∼3. This data was phased with . SNPs associated with Asian and European ancestry and haplotype diversity flanking them were identified as described above.35 In this study a series of population genetic statistics were used to fit demographic history of simulated populations to those observed for African. assuming one generation to be 25 years.750 years. ago (4) Neolithic admixture 288 generations ago. ago We used the recombination rate ratio (cM/Mb) mappings for the first chromosome from HapMap project36 to model variation in recombination rate in simulated sequences. An additional 18 admixed populations were generated by simulating admixture events between European and Asian populations at different times in the past (measured in generations) and using different proportions: 50/50. Here. Admixture proportions for each simulated individual were then inferred with structure-like analysis assuming three populations.000 years.500 years. that is 62 generations after Neolithic expansion in a European population as defined in the best fit model of Schaffner et al. ago (3) Admixture 400 generation. ∼12.750 years ago (2) Admixture 500 generations. and European populations. Asian. From each simulated population a sample of 30 sequences were drawn to construct 15 genotypes that were then subjected to quality control and LD pruning steps as for the Illumina genotyped populations analyzed in this study. ago (6) Historical time admixture 70 generations. ∼1. this is roughly 18. and European populations.

315 SNPs to allow for unbiased comparisons between India and other geographic regions.1. we performed searches for gene enrichment for all Gene Ontology (GO) terms by using DAVID 6. On the basis of this list. Pickrell. where the allele was unknown (17. Results We have based our analyses of human genetic variation on a sample of 1310 individuals that belong to 112 populations.23 however. it was assumed to be the ancestral allele. Where the chimpanzee allele was known. we chose not merge any adjacent outlier windows because this procedure can be very conservative and significantly affect the ranking of windows (data not shown). ancestral and derived states for each SNP were established by comparison to the UCSC snp128OrthoPanTro2RheMac2 table.36 For iHS. The sample set includes 142 previously unpublished samples from India and published compatible data from South Asia . and Bantu farmers have clustered together in previous analyses of population structure. HGDP Europeans were used as the outgroup for analyses where the focal population was African farmers.Beagle 3. XP-EHH and FST require two populations. 3. the SNP was excluded from all subsequent calculations.40 on all genes in the top 1% and 5% windows of the iHS and XP-EHH test statistic distributions.38 they were grouped together in our analyses and were used as the outgroup population for all comparisons.37 Although integrated haplotype score (iHS) and cross population extended haplotype homozygosity (XP-EHH) have already been calculated for the HGDP-Centre d'Etude du Polymorphisme Humaine (CEPH) panel. Because the Mandenka. Yoruba. Both XP-EHH and iHS scores were normalized and windowed as in Pickrell.739.23 we recalculated all statistics by using our 531.36% of the data). Genetic distances between markers were calculated with the HapMap genetic map.868 SNPs. The longest transcript length was used for genes with multiple transcripts. XP-EHH and iHS were calculated as previously described with tools provided by J. Enrichment Testing We retrieved the list of RefSeq genes from the UCSC table browser and mapped the starting and ending coordinates of all genomic transcripts to our windows.25.

000 SNPs (Table S1 ).442 individuals but only ca.046.042) than with East Asian (average FST = 0.051) populations. the Pakistani (Indus Valley) populations differ substantially from most of the Indian populations and show comparably low genetic differentiation (within the FST range of 0. and Indian populations (Figure 1 and Figures S1 and S11 ).020) from European. At the interregional scale. chosen to represent the global and regional contexts of human genetic variation.030) than to other Indian populations (average FSTs 0. Display large version of this figure Display high quality version of this figure View a PowerPoint of this figure Figure 1 Matrix of Pairwise Mean FST Values of Regional Groupings of the Studied Populations Average of intergroup FST values (where the regional group is composed of .006 on average) and West Eurasian populations (FST = 0. characterized internally. respectively) from the same geographic area (Figures S1 and S11 ).01). Importantly.42 the Brahmin and Kshatriya from Uttar Pradesh stand out by being closer to Pakistani (FST = 0.and beyond (Table S1 ). adding these sources yielded a combined data set of 1.017 and 0. by short interpopulation genetic distances (<0. like other continental regions. Mean pairwise FST values29 within and among continental regions (Figure 1) reveal that the South Asian autosomal gene pool falls into a distinct geographic cluster. For some analyses we also included published data on Indian populations18 genotyped on a different platform.41. the South Asian cluster shows somewhat shorter genetic distances with West Eurasian (average FST = 0. Caucasian. 95. Near Eastern. In agreement with previous Y-chromosome studies.008–0.

On this PC1 × PC2 composite cline. overlapping substantially with most of the samples from the southern. Compared to Gujaratis. Central India is itself a composite of two regional groupings of samples from different populations that makes the negative intergroup FSTuninformative. However. The inclusion of more populations from Europe and the Caucasus24.18 identified a cline of Indian populations toward Europe with no corresponding cline within the Europeans. the Uttar Pradesh samples are more widely dispersed. Similar to the patterns revealed by the pairwise FST results. Kerala. Pakistanis are positioned between Indian and West Eurasian populations on this plot.59) and notably PC2 (r = 0. where both PC1 (r = 0.87) display significant correlation with distance from Spain and Iran. whereas Reich et al. Display large version of this figure Display high quality version of this figure View a PowerPoint of this figure Figure 2 . Furthest on thePC2 axis lay samples from the southern Indian states of Karnataka. PCA of the Eurasian populations clusters them by geographic proximity with the first component separating West from East Eurasia and the second component differentiating South Asian populations from the rest (Figure 2A and Figures S12 and S13 ).multiple populations) is given in the diagonal. an edge of which is formed by a subset of the Hapmap Gujaratis and Uttar Pradesh Brahmins and Kshatriyas. we observe a more complex picture. most of the Indian populations form a disperse cluster. respectively (Figure S2 ). and the Pulliyar population from Tamil Nadu. Dravidic speaking states of Tamil Nadu and Andhra Pradesh.26 reveals a cline within the West Eurasian cluster on the PCA (Figure 2A). Consistent with their geographic location.

PCA reveals that the genetic landscape of South Asia is characterized by two principal components of which PC2 is specific to India and PC4 to a wider area encompassing Pakistan. A.69). The fourth PC is of particular interest because it connects Baluchistan. ∗∗. and we find Bedouins and Lithuanians on either end of the PC3 axis (Figure S3 ). and Madhya Pradesh. Andhra Pradesh. #. These relations are identifiable also from spatial representations of the principal components (Figure S2 ). Notable. and Central Asia. AA. Austroasiatic languages. which is difficult to absorb into current models of human demographic history. Tamil Nadu. Nadu. however. TB. DR.P. ∗. Chattisgarh and Jharkhand. Tibeto Burman speakers. In order to study this duality in more detail. These two main ancestry components—k5 and k6. Austroasiatic speakers. within South Asia (India and Pakistan). is that PC4 has nonmarginal values also in northeast China. AA. the Caucasus. Kar. The following abbreviations are used: IE. we used the model-based structure-like algorithm ADMIXTURE31 that computes quantitative estimates for individual ancestry in constructed hypothetical ancestral populations. The following symbols are used: ∗. The third PC differentiates West Eurasia by latitude.. 2. §. Tibeto Burman speakers from east Indian states Meghalaya and Nagaland.000 SNPs (A) principal component analysis of the Eurasian populations.60). Orissa. but this is to a large extent explained by spatial autocorrelation because correcting for that renders a p value slightly over 0. T.05. Karnataka. (B) ADMIXTURE analysis at K = 8 and 12. contains one Lambadi. Ker.69) with longitude and PC2 with latitude (r = 0. Kerala. Nihali language isolate speakers from Maharasthra.Genome-Wide Structure of the Studied Populations Revealed by 530. Rajasthan. The strongest correlation is with distance from Iran (r = 0. contains one Dhurwa. 1. Indo European speakers. Notably. Both remain significant after correcting for spatial autocorrelation. Chattisgarh. The spread of PC4 in West Eurasia is not concentric and thus difficult to explain by correlation with geographic distance from any one point. PC1 is strongly correlated (r = 0. the Caucasus. data from Hapmap. colored light and dark green inFigure 2B—are observed at all K values between K = 6 and K = 17 . Overall. and Central Asia (Figures S2 and S3 ). Most South Asians bear membership in only two of the constructed ancestral populations at K = 8. Dravidic speakers. 3.

(Figure S4 ). (on average 0. Brahui. These are the thus-far sampled Austroasiatic tribes from east India. the share of this ancestry component in the Caucasus populations (0.00001) perfectly with PC4and PC2 in West Eurasia.9. who originated in Southeast Asia and represent an admixture of Indian and East Asian ancestry components.21and two small Dravidian-speaking tribes from Tamil Nadu and Kerala. There are a few populations in India who lack this ancestry signal altogether. Because recombination on autosomal chromosomes will over time erase the signal and thus limit the utility of this approach. a more steady cline (correlation r = 0. the Caucasus. respectively. Looking at the Pakistani populations (0. the Near East and Europe).51) and Baluchistan (Balochi.500 years ago assuming one generation to be 25 years). Our simulations show that differences in haplotype diversity between source and recipient populations can be detected even for migration events that occurred 500 generations ago (∼12.59). and. then one would expect different levels of associated haplotypic diversity to suggest the point of origin of the migration. we counted the number of unique haplotypes in genomic windows surrounding SNPs in strong positive association with this ancestry component. Importantly. haplotype diversity is comparable among all studied populations across West Eurasia and the Indus basin (Figure S8 ). finally. However.7 with distance from Baluchistan) of decrease of probability for ancestry in the k5 light green ancestral population can be observed as one moves from Baluchistan toward north (north Pakistan and Central Asia) and west (Iran. For alleles associated with k5. and Makrani) in particular (0. . there is only a very weak correlation (r = 0. considering the geographic spread of this component within India.50) is comparable to the Pakistani populations. If the k5 light green ancestry component (Figure 2B) originated from a recent gene flow event (for example by a demic diffusion model) with a single center of dispersal where the underlying alleles emerged.26) (Figure S5 ). These correlate (r > 0.4) between probability of membership in this cluster and distance from its closest core area in Baluchistan (Figure S6 ). p < 0. the proportion of the light green component (k5) is significantly higher than in the Indian populations. To assess diversity within the ancestry components revealed by the ADMIXTURE analyses at K = 8. we used simulations to explore how deep in time one can go to trace directionality of migration (Figure S7 ). Instead.

However. we calculated iHS44 and XP-EHH. This observation shows again that haplotype based measures of diversity can be relatively robust to ascertainment bias. the dark green ancestry component. we found that haplotypic diversity of this ancestry component is much greater than that of those dominating in Europe (k4. for all Dravidian and Indo-European speaking Indian individuals in our combined data set (n = 154). It is notable that this ancestry component also exhibits greater haplotype diversity than European or Near Eastern components despite the fact that the Illumina genotyped markers were principally ascertained in a sample of European individuals. thus pointing to an older age of the component and/or long-term higher effective population size (Figure S8 ). To examine this possibility in greater detail. However. Our results largely agree with the recent description of three main patterns underlying selective sweeps in continental Eurasian populations following the out-of-Africa event46 and suggest that Indian sweep signals have more in common with those detected in West rather than East Eurasia. In contrast to widespread light green ancestry. K > 10). when we compare the fraction of outlying Indian signals also found in European or East Asian populations to the fraction of outlying Pakistani signals . k6 is primarily restricted to the Indian subcontinent with modest presence in Central Asia and Iran.43 Long-standing human habitation of the Indian subcontinent should have provided ample opportunity for the action of positive selection and the emergence of adaptations to the local environment.45 two haplotype-based tests that detect positive natural selection. depicted in light blue). we calculated the fraction of windows in the top 1% of the Indian test statistic distribution shared with the top 5% windows in other populations (Figure 3). depicted in dark blue) and the Near East (k3. Haplotype diversity associated with dark green ancestry is greatest in the south of the Indian subcontinent. indicating that the alleles underlying it most likely arose there and spread northwards. After dividing the autosomal genome into 13. Haplotype diversity flanking Asian alleles (k7) is twice greater than that of European alleles—this is probably because the k7 ancestry component is a composite of two Asian components (see Figure S4 .274 nonoverlapping 200 kb windows covered by our SNP data set (see Material and Methods ).

Mandenka.j). Significantly Enriched GO Terms. and Bantu individuals from the HGDP-CEPH panel. This result remains when we examine signals of recent positive selection in north and south India separately.05 significance level was used (Table S2. we tested for overrepresentation of GO47 terms in the countrywide results.Table S3. this is powerful evidence that Pakistan is a poor proxy for South Asian genetic diversity. despite having often fulfilled this role in previous publications.j). in the Top 1% . Before FDR Correction. The fraction of signals found in the top 1% of XP-EHH scores in population iand the top 5% of population j is given in cell (i. Mandenka. we find Pakistan consistently appearing markedly more similar to West Eurasian than to Indian populations (Figure 3). Display large version of this figure Display high quality version of this figure View a PowerPoint of this figure Figure 3 Sharing Signals for Selection between Continental Populations (A) iHS signal sharing between continental populations. Significantly Enriched GO Terms.shared with the same regions. Africa refers to Yoruba. To gain insight into the type of biological processes likely to have come under positive selection in India. in the Top 1% iHS All India Results. These analyses revealed that 20 GO terms were overrepresented in our windowed top 1% iHSresults and 27 were overrepresented in the top 1% XP-EHH results when an individual 0. The fraction of signals found in the top 1% of iHS scores in population i and the top 5% of population j is given in cell (i. Reported by DAVID Analysis. Africa refers to Yoruba. Before FDR Correction. Combined with our ADMIXTURE and PCA results. and Bantu individuals from the HGDP-CEPH panel. (B) XP-EHH signal sharing between continental populations.

and because positive selection does not necessarily entail pathway enrichment.56 Other genes in the window are TMEM165. after false-discovery-rate (FDR) correction for multiple testing. Nevertheless.XP-EHH All India Results. and lifestyle following 48 49 50 industrialization. no terms associated with genes found in the top 1% of either test remained significant. which shows strong evidence of selection in all populations. a negative regulator of skeletal muscle tissue development expressed in utero and also associated with body fat accumulation . an interesting candidate for selection according to both XP-EHH and iHS results is MSTN (MIM 601788). . and the seventh and 16th most significant signal in south and north Indian. a key regulator of circadian rhythms in humans. Also strongly outlying (XP-EHH empirical p = 0. a member of the insulin signaling pathway. although principally in West Eurasia—it is also within the top 20 European windows but only at the tail end of the top 5% in East Asia. we note that one of the strongest XP-EHH signals (Table S4.55. the window is also present in the top 5% results in Europe and East Asia. . Top 20 Most Significant iHS Windows in the All India Results.0007). The gene is the seventh strongest signal in the countrywide results (empirical p = 0.0015) is CLOCK (MIM 601851). the incidence of which is rapidly growing in India and could represent maladaptations to recent changes in the environment.52 although this association has yet to be replicated in another cohort. These results include terms such as lipid metabolism and catabolism. a steroid reductase implicated in androgen signaling in some types of prostate cancer. Top 20 Most Significant iHS Windows in the All India Results) is a region in chromosome 20 containing the DOK5 (MIM 608334). Reported by DAVID Analysis). but nowhere else is evidence for positive selection for this gene nearly as powerful as it is in the Indian subcontinent. which are associated with genes implicated in the etiology of type 2 diabetes (MIM 125853). However.57 Finally. respectively.Table S5.51 A three SNP haplotype in this gene has been associated with increased risk of obesity and type 2 diabetes in a large homogeneous north Indian sample. a transmembrane protein of no known function and SRD5A3 (MIM611715). diet. Its disruption has been shown to associate with the development of type 2 diabetes53 and the etiology of metabolic syndrome (MIM 605552)54 as well as with general energy intake in overweight subjects. Notably.

Additionally. MSTN has been identified as a target of strong positive selection twice already on the basis of an excess of derived alleles that indicate the action of positive diversifying selection. FST at the genomic window associated with MSTN is high when compared to genomic averages between Indians and Europeans. we confirmed the existence of a general principal component cline stretching from Europe to south India. rendering successful reconstruction of the haplotypes presented by Saunders64in our data impossible without additional genotyping.12 While combining the new data we generated for north and south Indian populations with these public resources.and expressed throughout gestation in the human placenta. suggesting that they might represent only a fraction of genetic variation in South Asia just as they represent only a fraction of genetic variation in Europe. and HIBCH (MIM610690). the populations of the Indian subcontinent have been underrepresented in genome-wide data sets that have been compiled in attempts to address global patterns of variation at common SNPs.61 The gene shares a window with an uncharacterized reading frame. although low between Indians and East Asians. Discussion Relative to East and West Eurasia.59.0056. 63. see Figure 2) is less than one third of the diversity observed among all South .60.64 although neither of the implicated SNPs are included in our data. the relatively low genetic diversity among Pakistani populations (average pairwise FST 0. who show substantial admixture with Central Asian populations. where it plays a role in glucose uptake. In this study we have asked how representative of South Asian genetic variation are the available and widely used data sets including populations of Pakistan from the HGDP25 and Gujaratis from HapMap Phase 3 data. especially in African individuals.62 the window is associated with extremely significant empirical p values in both iHS and XP-EHH scans (Table S4 ).58. Nonetheless. and between Indians and African farmers. although this measure excludes the Hazara. C2orf88. a component in the propionate catabolism pathway.18 Pakistani populations are in the middle of this cline (Figure 1) and show similar FST distances both to populations of Europe and to those of south India.

However.0184). even when excluding the most divergent Austroasiatic and Tibeto-Burman speaking groups of east India.78 standard deviations from the location of the Indian mean (excluding the outlying Austroasiatic and Tibeto-Burman speakers). Notably.. which is consistent with previous studies on uniparental5. For comparison. and of k5 and k6 in the ADMIXTURE analysis (Figure 2 and Figures S2 and S6 ). similar to those estimated for other populations of India and appear on the Indian cline between Pakistanis and south Indians.06 and 0. thus being better representatives of the genetic diversity of South Asia than Pakistanis.46. the correlation of PC1 with longitude within India might be interpreted as a signal of moderate introgression of West Eurasian genes into western India. although the geographic representation of Indian populations on our PC plot is neither comprehensive nor balanced. PC1 and PC2in Figure 2 and Figure S2 ) might not necessarily reflect one major episode of gene flow but be rather a reflection of complex demographic processes involving drift and isolation by distance. the HapMap Gujaratis show genetic distances to other global populations.95 standard deviations away from the Indian mean.6 and autosomal markers.67 Similarly. on average the Pakistani and Tamil Nadu samples are located 3. probably because of their history of genetic bottlenecks.Asian populations (0.66.18 Overall. which appear very close to our all-Indian mean. we note that on average the Gujarati samples position 0. Nevertheless. could be seen as consistent with the recently advocated model where admixture between two inferred ancestral gene pools (ancestral northern Indians [ANI] and ancestral southern Indians [ASI]) gave rise to the extant South Asian populace. This is about five times more than the mean value from samples from Uttar Pradesh. all South Asian populations. respectively. This could be either because of Indian populations sharing a common ancestry with West Eurasian populations because of recent gene flow or because East Asian populations have relatively high pairwise FST with other non-African populations. the clines we detect between India and Europe (e. the contrasting spread patterns of PC2 and PC4.65.18 The geographic spread of the Indian- . except for Indian Tibeto-Burman speakers. show lower FST distances to Europe than to East Asia (Figure 1). Although the Pakistani and Indian populations have largely nonoverlapping distributions on our PC plot (Figure 2).g. for example.

the Indian Indo-European.500 years ago a dramatic migration of Indo-European speakers from Central Asia (the putative Indo Aryan migration) played a key role in shaping contemporary South Asian populations and was responsible for the introduction of the Indo-European language family and the caste system in India. membership in multiple ancestry components can be interpreted as admixture. in turn.specific PC2 (or k6) could at least partly correspond to the genetic signal from the ASI and PC4 (or k5). In concordance with the geographic spread of the respective language groups.6. However. and the Caucasus.68 In structure-like analyses. . the locations of the Indian populations on the PC1/PC2 plot (Figure 2A and Figure S12 ) reflect the correlated interplay of geography and language. 3.74 However. might represent the genetic vestige of the ANI (Figure S2 ). within India the geographic cline (the distance from Baluchistan) of the Indus/Caucasus signal (PC4 or k5) is very weak. the positions of Indo-European-speaking Bhunjia and Dhurwa amidst the Austroasiatic speakers probably corroborates the proposed language change for these populations. which is unexpected under the ASI-ANI model. In this respect. distributed across the Indus Valley. in agreement with their suggested origin in Southeast Asia21 drawn away from their Indo-European speaking neighbors toward East Asian populations. any nonmarginal migration from Central Asia to South Asia . Overall. However. A few studies on mtDNA and Ychromosome variation have interpreted their results in favor of the hypothesis. according to which the ANI contribution should decrease as one moves to the south of the subcontinent.70. some heuristic interpretations of the ancestry proportions palette in terms of past migrations seem too obvious to be ignored. This can be interpreted as prehistorical migratory complexity within India that has perturbed the geographic signal of admixture.3. it is interesting to note that.72 whereas others have found no genetic evidence to support it.and Dravidic-speaking populations are placed on a north to south cline. Central Asia. For example. although represented by only one sample each. or even unresolved 25 69 ancestry. The Indian Austroasiatic-speaking populations are.73. shared ancestry. it was first suggested by the German orientalist Max Müller that ca.71.

and although it has been shown that demic diffusion coupled with influx of Turkic speakers during historical times has shaped the genetic makeup of Uzbeks75 (see also the double share of k7 yellow component in Uzbeks as compared to Turkmens and Tajiks in Figure 2B). we see that only the k4 dark blue component is present in India and northern Pakistani populations. we have to conclude that if such a dispersal event nevertheless took place. In terms of human population history. Several aspects of the nature of continuity and discontinuity of the genetic landscape of South Asia and West Eurasia still elude our understanding. Y chromosome variants of the R1a clade are spread from India (ca 50%) to eastern Europe and their precise origin in space or time is still not well understood. Both PC2 and k5 light green at K = 8 extend from South Asia to Central Asia and the Caucasus (but not into eastern Europe). it is not clear what was the extent of East Asian ancestry in Central Asian populations prior to these events. Our simulations show that one can detect differences in haplotype diversity for a migration event that occurred 500 generations ago.500 years ago and predates or coincides with . complex. This patterning suggests additional complexity of gene flow between geographically adjacent populations because it would be difficult to explain the western ancestry component in Indian populations by simple and recent admixture from the Middle East.76 In our analysis we find genetic ancestry signals in the autosomal genes with somewhat similar spread patterns. whereas. however.should have also introduced readily apparent signals of East Asian ancestry into India (see Figure 2B). we investigated the haplotypic diversity associated with the ancestry components revealed by ADMIXTURE. our oldest simulated migration event occurred roughly 12. in contrast. the k3 light blue component dominates in southern Pakistan and Iran. The demographic history of Central Asia is. In an attempt to explore diversity gradients within this signal. Because this ancestry component is absent from the region. but chances to distinguish signals for older events will apparently decrease with increasing age because of recombination. Whereas the maternal gene pool of South Asia is dominated by autochthonous lineages. it occurred before the East Asian ancestry component reached Central Asia. Another example of an heuristic interpretation appears when we look at the two blue ancestry components (Figure 2B) that explain most of the genetic diversity observed in West Eurasian populations (at K = 8).

We found no regional diversity differences associated with k5 at K = 8. in which case it is likely that West Eurasian diversity is derived from the more diverse South Asian gene pool. such as the hypothetical Indo-Aryan migration. this raises the question of whether such a relationship can be explained by a deep common evolutionary history or secondary contacts between two distinct populations. Abdulla15) falls well within the limits of our haplotype-based approach. Knowing whether signals associated with the initial peopling of Eurasia fall within our detection limits requires additional extensive simulations. could have happened more recently—our haplotype diversity estimates are not informative about the timing of local admixture. Namely. The admixture of the k5 and k6 components within India. its spread to other regions must have occurred well before our detection limits at 12. but our current results indicate that the often debated episode of South Asian prehistory. one could ask whether Indian populations contain a reservoir of selective signals hitherto unidentified in other Old World groups.500 years. Given the close genetic relationships between South Asian and West Eurasian populations.. regardless of where this component was from (the Caucasus. Indus Valley.500 years ago (see e. do the observed instances of shared ancestry component and selection signals reflect secondary gene flow between two regions. the putative Indo-Aryan migration 3. or do the populations living in these two regions have a common population history. Near East. or Central Asia). the introduction of k5 to South Asia cannot be explained by recent gene flow. Accordingly. did genetic variation in West Eurasia and South Asia accumulate separately after the out-of-Africa migration. Both k5 and k6 ancestry components that dominate genetic variation in South Asia at K = 8 demonstrate much greater haplotype diversity than those that predominate in West Eurasia. This pattern is indicative of a more ancient demographic history and/or a higher long-term effective population size underlying South Asian genome variation compared to that of West Eurasia. Thus.the initial Neolithic expansion in the Near East. as evidenced by both shared ancestry and shared selection signals.g. however. Similar to observed patterns of neutral genetic diversity. akin to what has been found in uniparentally inherited .

Compared to Pakistani populations (87%). examination of the genes associated with these terms revealed that all significant results could be ascribed to positional gene clustering. on-going sweeps and is therefore more likely to highlight recent. barring the actual numbers on them. the results leave ample room for the existence of local adaptation to the Indian environment. fell within the same 200 kb window but were treated as independent findings by the gene set enrichment analysis tool we used. Given the degree of resolution provided by the data sets that we have . or whether the region fits into the Eurasian landscape of positive selection signals. by its nature. both north (66%) and south Indian (52%) populations share substantially less signals of complete selective sweep with European populations (Figure S9 ). and Indian as well as Pakistani populations share more signals with West Eurasia than with the rest of the world. In the case of iHS. our haplotype-based scans of positive selection showed similar patterns of signal sharing to those revealed by F ST comparisons. Despite this. detects older or stronger sweeps acting on alleles that have reached high frequency in a given population.markers. respectively). Our analysis of the genes contained within the top 1% of selective signals in the countrywide data suggested that 25 GO terms were overrepresented among our strongest selection candidates.23 At the global level. However. It is worth recalling that geneenrichment tools were originally devised for the assessment of gene expression changes in microarray RNA work. probably stemming from the nature of iHS. both recent and old. generally members of a single gene family.77 have shown that the vast majority of XP-EHH signals are shared across extended geographic distances. where individual genes could be unequivocally identified. private signals of local adaptation that have not yet become widespread by gene flow. Indian signals sharing with Europe and East Asia was less pronounced (37% and 32%. Figure 1 and Figure 3 bear a striking similarity to each other. Sharing of the complete sweep signals between India and East Asia is even lower (53%). Previous studies23. We also tested the top 5% of results in the Indian data and found that five GO terms related to cell-cell binding and metal ion binding remained highly significant after multiple testing corrections (data not shown). whereby multiple genes associated with the same GO term. as it detects younger. In fact. XP-EHH. although none were significant after Benjamini correction.

80 this difference persists into adulthood.81. The latter approach could successfully correct for the clustering effect we identify and more generally for the effect of gene size on enrichment results. CLOCK. whereby ontological associations are mapped not to individual genes. and soon greatest in absolute terms. whereas MSTN is not an obvious candidate for involvement in disease etiology because its main function is negative regulation of muscle development in utero. We believe that collapsing annotations to the window level could reduce the false-positive rate in enrichment scans. these children are already adipose and exhibit some degree of insulinresistance when compared to European babies.83 as well as a sizeable number of cases of the metabolic syndrome. although at the same time it would be far more conservative and risk obscuring genuine signals. is in a window that contains seven other genes.81 It bears recalling that India has one of the world's fastest growing. Indian newborns weigh on average 700 g less than their European counterparts yet have a similar absolute fat mass.82. MSTN. and GO categories associated with long genes are therefore more likely to appear enriched. incidence of type 2 diabetes. whereby long genes are more likely to be statistical outliers simply because they contain more SNPs than short genes. which can be expected to contain candidates for adaptation via classical sweeps. such that the average age of diagnosis of diabetes in India is 10 years lower than in Europe. we chose to examine the contents of the 20 strongest iHS and XP-EHH signals. Interestingly. although one of them. In the wake of these results.86 Phenotypically. for example. and PPARA—implicated in lipid metabolism and etiology of type 2 diabetes. even nonobese Asian Indians have .84 both of which have been linked to recent rapid urbanization. PPARA. none of the five significant GO terms at the genic level are significant when examined at the windowed level (data not shown). but rather to the windows they occupy.used here. In our data. Variation in DOK5 and CLOCK has been previously associated with type 2 diabetes and metabolic disorders.79 At birth. Alternatives include the precise CMS test that often is applicable on dense HapMap2 data78 or a windowing approach. it also plays a significant role in glucose uptake.85. Within these regions we find four genes— DOK5. any attempts to use automated annotation tools to understand signals of positive selection extending over multiple genes is fraught with interpretative perils.

Reisberg.V. M. Soo.90 In this context. A. thank the European Union European Regional Development Fund through the Centre of Excellence in Genomics to Estonian Biocentre. S. Therefore.45. Underhill for discussion.been shown to exhibit increased levels of insulin resistance compared to European controls.89 and show differences in adipocyte morphology.M.23. I.C. This intricacy cannot be readily explained by the putative recent influx of Indo-Aryans alone but suggests multiple gene flows to the South Asian gene pool. both from the west and east. a change that became disadvantageous after changes in diet and lifestyle. lipid metabolism and type 2 diabetes are all complex traits and the effect of natural selection would be expected to be fragmented across multiple genes. Pickrell and J. G. R. Raj. Summing up. and C.87 They also have increased levels of both subcutaneous and visceral adipose tissue at the expense of lean tissue when compared to matched-age and weight European controls88. our results confirm both ancestry and temporal complexity shaping the still on-going process of genetic structuring of South Asian populations. T. because relevant life-history traits. We highlight a few genes as candidates of positive selection in South Asia that could have implications in lipid metabolism and etiology of type 2 diabetes.M. However. and University of Tartu. making them worthy candidates for further functional examination. This research was supported by Estonian Basic Research grant . Barna for help calculating iHS and XP-EHH scores. and P.77 it would be naive to expect that a relationship between past selective processes and present-day disease would be mechanistically simple and explainable by variation at a handful of genetic loci.B. V. Further studies on data sets without ascertainment and allele frequency biases such as sequence data will be needed to validate the signals for selection. Anton for technical assistance. J. over a much longer time span. and L.. Migliano. Acknowledgments We thank A. the loci we identify could be theoretically considered responsible for some of the present type 2 diabetes epidemic in India. Aasa. it is tempting to hypothesize that past natural selection might have influenced genetic variation at these loci to increase infant survival. Hilpus..

in the Top 1% XP-EHH All India Results. G..V.. new data and used public data used in the study. Reported by DAVID Analysis (ZIP 7 kb) Table S4. Government of India to L. http://www.org/home Online Mendelian Inheritance in Man (OMIM).M.M. University of Tartu and with University of Cambridge Bioinformatics and Computational Biology services. http://www.H.M.R. Significantly Enriched GO Terms.R. and R. Information on Samples Used in the Study (ZIP 84 kb) Includes information on both.S.M. and Council of Scientific and Industrial Research..M. and SF0180026s09 to M.N. Estonian Science Foundation grants (7858) to E. and R.Y. and (8973) to M.K.omim..1000genomes. Figures S1–S13 (PDF 4577 kb) Table S1. Tartu University grant (PBGMR06901) to T. Calculations were carried out in the High Performance Computing Center... Reported by DAVID Analysis (ZIP 45 kb) Table S3. in the Top 1% iHS All India Results. R. Top 20 Most Significant iHS Windows in the All India Results (ZIP 26 kb) Table S5.. Before FDR Correction.. Top 20 Most Significant iHS Windows in the All India Results (ZIP 21 kb) Table S6. Before FDR Correction.SF0270177As08 to R. Estonian Ministry of Education and Research (0180142s08) and European Commission grant 245536 (OPENGENE) to M. Significantly Enriched GO Terms. Population Aberrations for PCA on Figure S1 (ZIP 35 kb) Web Resources The URLs for data presented herein are as follows: The 1000 Genomes Project. European Commission grant (ECOGENE 205419) to M. Supplemental Data Document S1. and K.org .. I. B.T.G.V. Table S2.

. (1999)..(1999). P. M.. W... The prehistory of Sri Lanka..E. N.. Metspalu. M. M. Endicott. Boivin.M. Curr. 26. Oppenheimer.. and Villems. Kivisild. S.D.. T.S. Lahr. Kaldma. M.. T.. D... Sci. Deep common ancestry of indian and western-Eurasian mitochondrial DNA lineages.J. C. Singh. PubMed 5 Metspalu. Middle Paleolithic . W.D... Koshy.. S. (Sri Lanka: Department of Archaeological Survey). M. S. R. Trivedi. Y. (2004). Chakraborty. Parik. 25. C. Biol. Sabo.References 1 Xing. pp.U..... Korisettar... R. 1331– 1334. et al.. Sitalaximi. A prehistory of Indian Y chromosomes: evaluating demic diffusion scenarios. Gilbert. Q. Genome Biol. Serk. an ecological perspective.. R. CrossRef| PubMed 7 Deraniyagala.PubMed 8 Petraglia.. J. 843–848.M.. R..Most of the extant mtDNA boundaries in south and southwest Asia were likely shaped during the initial settlement of Eurasia by anatomically modern humans. CrossRef | PubMed 2 Atkinson. et al.J..S.. Metspalu. (1992). J. Papiha. Gibbs. K.. T. Gaikwad. Ditchfield. M. D..M.. Himabindu.. P.. Muzny.. BMC Genet. S.T. Jones. Acad. Genetic diversity in India and the inference of Eurasian population expansion.. Papiha. Watkins.D. Biol. R. M. S. A. mtDNA variation predicts population size in humans and reveals a major Southern Asian chapter in human prehistory. Hudjashov.J. Banerjee. K. J.A. Reidla. et al. A.. G.. Karmin..B. S.... (2010). Watkins. M.. The place of the Indian mitochondrial DNA variants in the global network of maternal lineages and the peopling of the Old World. Proc.... S. 468–474. Dixon. Huff.S. J. USA 103. F. eds. Clarkson. Parik.. and Yu. T. 5. Natl... Bamshad. P. Behar. G. Abstract | |CrossRef | PubMed 4 Kivisild. E. Hu. Mol.. T. Kivisild. Evol. 135–152. D. Parik.(2007). Kaldma. R. In Genomic diversity. R113. L. Kaldma. C. CrossRef | PubMed 3 Kivisild. CrossRef | PubMed 6 Sahoo.(2006). 11.. 9.. Metspalu. Jorde.. M. M.. A.. and Drummond. Metspalu. K. Gray. (: Kluwer Academic/Plenum Publishers). M.. Metspalu. Laos. R... J.. Deka.S. Pyle. J. Bamshad. (2008). et al. E. M.

. Novembre. S.. 50– 51. R. Peltonen. (2011). et al. Cambon-Thomsen. Peltonen. Bryc. 453–456. (2009)..K. (2010).. et al. 52–58.H. Marks. CrossRef | PubMed 16 Auton. A.T. M.. Indap. et al. Science 326. Altshuler. Out of Africa: new hypotheses and evidence for the dispersal of Homo sapiens along the Indian Ocean rim. 87.. D. 288–311... L. Assawamakin.. Gutenkunst. Genetic landscape of the people of India: a canvas for disease gene exploration. 261–262. CrossRef | PubMed .A. et al.M. J. Peltonen. Nature 467. L. de Toma...H.. Gibbs. Wright.. L..E. H. V.C.D.. J. G. and Uerpmann. S.D. Schaffner. Science 331. B.... K.N. 114–116. 1061–1073.Q. Cazes.I... Degenhardt.M. A map of human genome variation from population-scale sequencing. Ahmed. M.... 795–803.. C.F.A... Ann.. Piouffre. A. Integrating common and rare genetic variation in diverse human populations..R.International HapMap 3 Consortium (2010). C. Reynolds. A. Yu. S. Genet. D. J. Hum. J.. L. A. and Clarkson. Bodmer. Calacal. A. F. A. Y. Bhak. H. PubMed 14 1000 Genomes Project Consortium (2010). Genome Res. M..M. V. CrossRef | PubMed 13 Cann..A. A human genome diversity cell line panel.. Nature 470. Chen. CrossRef | PubMed 12 Altshuler.. (2011)... R. 3–20..E. Bonne-Tamir. M. Global distribution of genomic diversity underscores rich complex history of continental human populations.CrossRef | PubMed 9 Petraglia. R. Usik. Biol. Chaurasia. CrossRef | PubMed 17 Indian Genome Variation Consortium (2008). N... W. The southern route ―out of Africa‖: evidence for an early expansion of modern humans into Arabia.HUGO PanAsian SNP ConsortiumIndian Genome Variation Consortium (2009). 19. J. Boyko.. 37. J. A.. Gibbs. M. Boivin. Jasim. Science 317. Parker. Archaeology: Trailblazers across Arabia. E. CrossRef | PubMed 10 Armitage.J.. Fuller. A. Nature467. Brahmachari. K. L. CrossRef | PubMed 15 Abdulla.. Dermitzakis.A. D.F.. S.. (2002). A.P. Science 296.F.G. CrossRef | PubMed 11 Petraglia.D. Chen. Lohmueller... I. C.. Haslam. Legrand. M. 1541–1545. Mapping human genetic diversity in Asia. Chen..assemblages from the Indian subcontinent before and after the Toba supereruption. Morel.M. Bodmer..

J. S. Barsh. D. Bamshad. Coop. Metspalu. e215. G.M. CrossRef | PubMed 22 Quintana-Murci..A.... Nature 461. 2. Population genetic structure in Indian Austroasiatic speakers: the role of landscape barriers and sexspecific admixture. G. C. Genome Res.. Cann. (2008). I.. Yudkovsky.... M. H. 826–837. Barsh. Brassington. L. D.. Kudaravalli... O... J.S. Am. Kutuev.... 827–845.. J. PLoS Genet. Feldman. S. The genome-wide structure of the Jewish people. Mol. Gonzalez-Quevedo...M. Absher. Blum.M. NinoRosales. 1607–1618. Hegde. D.. N. B..... (2009). (2004). A...Z. Behar. et al. Tang. Signals of recent positive selection in a worldwide sample of human populations. Absher. J.S. Biol.(2010). et al.. (2009). Wooding. Nature 466...T.G. S.S. et al. G.. R. Li.. CrossRef |PubMed 24 Behar. K.V. C.. B.W. Prasad. W.E. Yunusbayev.S.J.V. CrossRef | PubMed 20 Watkins. 13.. Nguyen.. and Singh. G. G.. Chaix. Price. Genome Res.. Srinivasan... 1013–1024. S. Worldwide . I. 238–242. L.. Zapata..(2011).S.M..M. Ramachandran.Z.. A.. M. Southwick.. A.M. A. Metspalu. B.M. Where west meets east: the complex mtDNA landscape of the southwest and Central Asian corridor. L..-M. Genet.. et al.. L..K.R. J. N.L. M. Hudjashov. M. Rootsi.. (2006). Ninis. (2003).. M. R.. Sayar. D.. S. Mahajan. Das. S.. Molinari. 74...A. Al-Zahery. et al. P.G.. Scozzari. 489– 494. Abstract | Full Text | PDF (1012 kb) | CrossRef | PubMed 23 Pickrell..M. J. M. M. R. Chaubey.R.. Romero. G. H.. D. Behar. et al.M. Soares. R. J. Reconstructing Indian population history.. Santachiara-Benerecetti... CrossRef | PubMed 19 Rosenberg. Y.L. Hum. Evol. et al. Novembre.. Myers. 19. Mägi. A.18 Reich. Cavalli-Sforza.. P. S. E. Rengo. R. Rosset. D. G. G.. CrossRef | PubMed 21 Chaubey. H... L. van Oven. Genetic variation among world populations: inferences from 100 Alu insertion polymorphisms. Rootsi. Parik... Wells. A. CrossRef | PubMed 25 Li. Low levels of genetic divergence across geographically and linguistically diverse populations from India. Choi. Semino. M.. M. 28. Patterson... Thangaraj. C.S. Walker. V.L. Casto. S. Carroll.. Feldman. Ostler. N. Rogers. Metspalu.

(2010).. Lindgreen.. Moltke. Behar. and Reich. 229–232. Metspalu.1093/molbev/msr221 | Published online September 13.. A. Varendi.K.D. D. (2009). I.CrossRef | PubMed 26 Yunusbayev. Sklar..E.doi:10. PubMed 31 Alexander... Biol. E.J. Marjoram.H.R. and Wall. C.. Metspalu. Bender. P. M. Gupta.D..Abstract | Full Text | PDF (594 kb) | CrossRef | PubMed 28 Patterson.... B. Li. (2006). M.. J. and Lange. 19. 2011.. Science 319. M. CrossRef | PubMed 32 Rasmussen.. The Caucasus as an asymmetric semipermeable barrier to ancient human migrations. 81. Maller. Y. M... 1655–1664. PubMed 27 Purcell. Ancient human genome sequence of an extinct Palaeo-Eskimo.L. T. C. Albrechtsen. M. Khusainova. Genetics 182. I.. L. Pedersen. H. .C...human relationships inferred from genome-wide patterns of variation.. S. Population structure and eigenanalysis. 559–575. and Clark.G. e190... Novembre. PLINK: a tool set for whole-genome association and population-based linkage analyses. R. Spatial Statistics and Geographic Exegesis. 757–762. Fast and flexible simulation of DNA sequence data.I.S. Ferreira. S. and Cockerham. PubMed 34 Chen.CrossRef | PubMed 30 Rosenberg.... Daly. Estimating F-Statistics for the Analysis of Population Structure. K. Genome Res. (2009). Fast model-based estimation of ancestry in unrelated individuals. 1358–1370.CrossRef | PubMed . Rootsi. Todd-Brown.19.R. 2.S. D. C.. PLoS Genet. 217–231..M. and Andersen. Methods for human demographic inference using haplotype patterns from genomewide single-nucleotide polymorphism data. de Bakker. 1100–1104. Evolution 38. P.... Järve.. Evol. (2009).. P.A. J. CrossRef | PubMed 29 Weir.. K.W. Kutuev.. K. D. (2011). G.. Neale. A.. 136–142. Am. J. E.S. Genet.. A. Hum. (2011). J. et al. Metspalu.. Nature 463. Sahakyan. Price. et al. M. R. Kivisild. Mol. CrossRef | PubMed 33 Lohmueller.. D. S.D. B.. N. Metspalu.(2007). B. (1984). K. Version 2. et al. Bustamante. PASSaGE: Pattern Analysis. Genome Res. J. Thomas.Methods in Ecology and Evolution 2. M.

. and Lempicki. Naidu.... Sherman. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. R.35 Schaffner. Reich. CrossRef | PubMed 41 Bamshad. Nucleic Acids Res.N. CrossRef | PubMed . Genetic evidence on the origins of Indian caste populations. D. Hum. CrossRef | PubMed 36 Frazer. Protoc. R.. Kidd.. Hum. 81.A. A. A. S..R. CrossRef | PubMed 39 Huang. Pritchard. S. Sherman.W. and Feldman.E. and Bamezai. Genet. L. (2009). Sharma.V.A.. Nat. (2001). Reddy. T. J. Dixon. P. et al.K..K. Kivisild.J. CrossRef | PubMed 37 Browning. Zhivotovsky. Ricker. K.A. R. D. PubMed 40 Huang.. Ballinger.. 1–13.. S.G. 47–55. Bhat. C. Rasanayagam.. Darvishi. Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Genome Res.1 million SNPs.F.. D. M. Jena. Belmont.S. A. 54.. J. K...T.K. Cox... CrossRef | PubMed 42 Sharma. L. Science 298. B. B. Daly. H. Hinds... R.. D. Bhanwer. Rao. and Altshuler.. M. Weber. N..W. Calibrating a coalescent simulation of human genome sequence variation. P. J.K.. (2009). W. J.E.. 4. K. (2002). P.. A second generation human haplotype map of over 3. 1084–1097.K. E. W. Abstract | Full Text | PDF (571 kb) |CrossRef | PubMed 38 Rosenberg.. W.S. 44– 57.R. Foo.. J.The Indian origin of paternal haplogroup R1a1∗ substantiates the autochthonous origin of Brahmins and the caste system. and Lempicki.. Prasad. Watkins.A. Rai.. S. Gabriel.. A. 37. Genetic structure of human populations. (2005). M.L. 11. 851–861.International HapMap Consortium (2007).J.M.G.A. Tiwari. Genome Res.B. S. D.M.A. C. 15. Boudreau. et al.. 994– 1004. Gibbs.. Am.M.. Stuve.. Leal. M... Genet....L. Nature 449. (2009). and Browning.. P.T.L.. Hardenbol. B.A. M. Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. J.. Singh. B. Cann. 1576–1583.. S. B. 2381–2385. (2007).

S. and Shoelson. Gene ontology: tool for the unification of biology.. et al.. Pickrell. 4. CrossRef | PubMed 47 Ashburner.P. Dolinski. Coop. Nutr.K. Tandon. Insights from the developing world: thrifty genotypes and thrifty phenotypes. P. Soc..A.W.. CrossRef | PubMed 44 Voight. 599– 602.. M. J.. Genet. Rayco-Solon. Chauhan. Dwivedi. S.. B. Genet. P.. Feldman. J.A.L. A..The Gene Ontology Consortium (2000). Gaudet. Absher.. Hostetter. e1000500.M. 38. PubMed . K. 35.. Davis. and Moore. Nat. J. CrossRef | PubMed 49 Prentice.E. Xie. J. E.C. CrossRef | PubMed 52 Tabassum.. et al. PLoS Biol...International HapMap Consortium (2007).A. 153– 161.M. A. Two new substrates in insulin signaling. Dhe-Paganon. Byrne. 64. Cotsapas. J...M..K.e72.. Botstein. Wen. McCarroll. 913–918. R. Butler. Nat..K. BMC Med. S. G. H. Mahajan. Wen.. (2005).A. M. (2003). 1251–1260.. R. J.K.43 Conrad. S. Genet. (2003). (2010). 408–418. Rosenberg. Ma... B. Genome-wide detection and characterization of positive selection in human populations. Lancet 375. The role of geography in human adaptation. P.F. Kudaravalli. C. D. D. J... Cavalli-Sforza.A. and Bharadwaj. 25.. R. CrossRef | PubMed 46 Coop.. A worldwide survey of haplotype variation and linkage disequilibrium in the human genome. N. J. G. X.. S.Q. Varilly. Dwight.. Chem.P. (2006). J.. E. Cherry....T. Eppig.. M.. S.. Li.F. J. C. C. and Snehalatha. Lee. J. L. Lohmueller... Novembre. S. Kudaravalli. PLoS Genet. CrossRef | PubMed 48 Diamond. (2009). 25323–25330. J..... Jakobsson. CrossRef | PubMed 51 Cai. CrossRef | PubMed 50 Ramachandran.H.. J. N. Ball. Ghosh.. O. Fry.. Nature 423. and Pritchard. Evaluation of DOK5 as a susceptibility gene for type 2 diabetes and obesity in North Indian population. J.. (2006). A.. R. Diabetes in Asia. Blake.E.C. X. D. 25–29. (2010). CrossRef | PubMed 45 Sabeti. A map of recent positive selection in the human genome.Nature 449. Melendez. D.S. and Pritchard.. S.. IRS5/DOK4and IRS6/DOK5. Wall.D. 5.. and Pritchard. Myers. G.. P. The double puzzle of diabetes.Proc. A. D.. X. 278. Biol. 11.

Am. 1466– 1475.. S..D.M.. Ko. (2008). Y.D. J. J. 99.. Madrid. (2009). Arnett. Corbalán. Shen. Omura. Nakamura.K. 595–601.H.Disruption of the clock components CLOCK and BMAL1 leads to hypoinsulinaemia and diabetes.. Okuyama.... M. J. Y. C. CrossRef | PubMed 59 McPherron. Sharma. Y. Lee. Clin.Y. type-3) is overexpressed in hormone-refractory prostate cancer.C. Shen.D..364–369.C. Invest.. Parnell.. and Ordovas.. Leung.K... M. K. Cancer Sci. M. Honma. C. Su. J. PubMed 58 McPherron. K.M. J. Mo. M. J.. G. CrossRef | PubMed 61 Mitchell.. (2002). and Ordovas. A.. A.A. E. Tsai. and Ordovas. Lai. Tamura. J. H. Ramsey. M. Tsai. Arnett. Y.. M.C.D. Chung.. Nutr. Eur.J. (2010). J. Kobayashi. Morales. Nature 466. D.. Genetic variants in humanCLOCK associate with total energy intake and cytokine sleep factors in overweight subjects (GOLDN population). 18. D.H. Ivanova. L. and Lee. Lee. (1999)...C. CrossRef | PubMed 60 Bass.M. Nature 387.C.. J. S.M. Lai.D. Oldham. Osepchook.-J.. Buhr. A. CrossRef | PubMed 55 Garaulet... S.C. 90. Vitaterna. Genet. H.. L. Domest. Regulation of skeletal muscle mass in mice by a new TGF-beta superfamily member. C.D..Q. (2010).53 Marcheva. M.. 81–86. and Lee. (1997). M. Lawler. et al... 516–523.. A. J. C. Myostatin is a human placental product that . 17.. J.C.J... Lee. (2006). K. Clin. E. CrossRef |PubMed 54 Garaulet. M.. Growth factors controlling muscle development.Q. C. S.. Parnell. Novel 5 alpha-steroid reductase (SRD5A3. Int J Obes (Lond) 34. McMahon.M.-C.Y.. Y. 627–631. R. Endocrinol. J. CrossRef | PubMed 57 Uemura. 83– 90... 109. Baraza. CLOCK gene is implicated in weight reduction in obese patients participating in a dietary programme based on the Mediterranean diet. M... S. 191– 197. CLOCK genetic variation and metabolic syndrome risk: modulation by monounsaturated fatty acids.. and Bass. C. B.. and Kambadur. and Nakagawa. Hum. Suppression of body fat accumulation in myostatin-deficient mice. Anim..CrossRef | PubMed 56 Garaulet.. (2010). J.

Genet. Lawrence. P. (2009). Genet. (2005). J. (1996). M.W. E. 26430– 26434. 271... Ferrell..A. Andrés. J.E. A. and Majumder.CrossRef | PubMed . 79. Abstract | Full Text | PDF (282 kb) | CrossRef | PubMed 65 Ramachandran. D. Hellmann. 79.M.T. Natl. M... J. Nat. 19. Genet. M.. Patterson. N. S.. Boyko..P. Mullikin. CrossRef | PubMed 64 Saunders. 39. Hubisz. Rosenberg.. Human adaptive evolution at Myostatin(GDF8). CrossRef | PubMed 66 Keinan. Genet.A. A.M. 366– 371.. Gutenkunst. Roseman. J. Hum. 41–46.Hum. Primary structure and tissue-specific expression of human beta-hydroxyisobutyryl-coenzyme A hydrolase.. J..J. Non-Darwinian estimation: my ancestors. F. Prugnolle. Measurement of the human allele frequency spectrum demonstrates greater genetic drift in East Asians than in Europeans. CrossRef | PubMed 63 Nielsen.C. USA 102..W.C. and Long.regulates glucose uptake..W. 703–710.. Congruence of genomic and ethnolinguistic affinities among five tribal populations of Madhya Pradesh (India). (2009).C.. Genome Res. 19..L. (2006). 15942–15947. J.-H. Clin. Torgerson. W. 1434– 1437.. M.. Support from the relationship of genetic and geographic distance in human populations for a serial founder effect originating in Africa. B. a regulator of muscle growth. A.. R. 91. (2005). et al.. Geography is a better determinant of human genetic differentiation than ethnicity.. R. my genes' ancestors. 118.. Endocrinol.. Harper. N. R. M..C. Li. M. Cargill. (2007). Deshpande. M. and Harris. A.. Adams. Jaskiewicz. L. Bunting. Mitra. CrossRef | PubMed 69 Weiss. I. and Reich. D. Sci. C. J. Feldman. Chem. and Balloux.. Good. M. F. K. O. Metab. Biol.. CrossRef | PubMed 68 Mukherjee. and Cavalli-Sforza.D. Y.. R.M. N. E. Albrechtsen.. CrossRef | PubMed 67 Manica. Proc. J.. Darwinian and demographic forces affecting human protein coding genes. 1251–1255. J. A.. Huang. 1089–1097. Genome Res. 838– 849..A. Chakraborty.. Shimomura. Am. and Nachman. Acad.. CrossRef | PubMed 62 Hawes. (2000). J.

S. Abstract | Full Text | PDF (312 kb) | CrossRef | PubMed . Stepanov.. Pitchappan. Metspalu. Hum. P. J. M. E. Metspalu.. 231–235.. O. 13... L. et al. S.A...K. V. Battaglia. J. T.. Eur. 10244–10249. Hegay. Chow. Bhattacharyya.Abstract | Full Text | PDF (1562 kb) | CrossRef | PubMed 75 Martínez-Cruz.A.70 Wells. Dey. et al. Semino. with special reference to peopling and structure. Austerlitz. S. Ségurel. and Coop.M. Genet. The Eurasian heartland: a continental perspective on Y-chromosome diversity.J.. Lin. Roy.. R. Hum..(2010). and Villems.. Yuldasheva. Shanmugalakshmi. Roy.. R. 20. R..K. (2003). 2277– 2290. S. 18. Curr.M. 72. G. (2011). C. Independent origins of Indian caste and tribal paternal lineages. King. G.The genetic heritage of the earliest settlers persists both in Indian tribal and caste populations. N..479–484. L.. Jin. Metspalu. Ruzibakiev. B. Hum... Nasyrova. CrossRef | PubMed 71 Basu. R. J. Su.... Parik.. Biol.Bioessays 29.. Separating the postGlacial coancestry of European and Asian Y chromosomes within haplogroup R1a.. Genet. Mastana. 313–332... M. CrossRef | PubMed 77 Pritchard. et al.A. Zhivotovsky.. S.A. et al.. M. Georges. and polygenic adaptation. R. B.. G. R.. S. Sirajuddin.. Natl.. V.. Kaldma..P.. F.. CrossRef | PubMed 72 Cordaux... Aunger. Myres. In the heartland of Eurasia: the multilocus genetic landscape of Central Asian populations. Am. M. Proc. BlueSmith. and Stoneking. J. M.Ethnic India: a genomic view. Underhill. Sci. I. Nasidze. J.V. F.. T.. Bentley. CrossRef | PubMed 74 Kivisild. N.. Biol... soft sweeps. N. Chakraborty. USA 98. (2010). Quintana-Murci... J.. R. A.. Curr. S. Eur. Aldashev. R. S. Roy. Adojaan. I. Rootsi.. Mukherjee.. Tolk. B. Acad.. (2004). Abstract | Full Text | PDF (73 kb) | CrossRef | PubMed 73 Chaubey. et al. Genet.. A. Rootsi. PubMed 76 Underhill. 19.E. S.. K. M. J. Banerjee. Théry. (2001). A. Genome Res. B. M.. Metspalu. T. 91– 100. 14.. L. The genetics of human adaptation: hard sweeps. H. (2003). (2007). Vitalis.. R208– R215. 216–223. Pickrell. L.S. Evseeva. M. N... P. Sengupta. Peopling of South Asia: investigating the caste-tribe continuum in India... Kivisild..

Int. E.. and Herman. 596–599. Neonatal anthropometry: the thin-fat Indian baby.. Zuk. Metab. Diabetes Care 27.S. 883– 886. S. Deshpande. CrossRef | PubMed 82 King. Relat. Yamuna. Science 327. and prevention.. Obes. (2009). C. Rege.J. and projections.S. Hirve. 60. Non-industrialised countries and affluence.S. and Khurana. 5575– 5580.. S. C. Relat. and Kellingray. C. et al.. PubMed 85 Fall. Br. C. J.H.. J.. numerical estimates... Cardiovascular risk factors in the normoglycaemic Asian-Indian population—influence of urbanisation.S. E. J. R. N. CrossRef | PubMed 86 Snehalatha. E. determinants.S. Fall..K.. I..A.78 Grossman. Med. G.. H.J.H.. Joglekar. Karlsson. Lubree. C.. A. Diabetes Care 21. CrossRef | PubMed . C. (2004). S.. 33–50. Angelino. Joglekar.... Disord. Diabetologia 52.. 1414– 1431. H. CrossRef | PubMed 81 Ramachandran.P. CrossRef |PubMed 79 Yajnik.. A. Naik. (1998). R.S. L. S. Metab. G.D. (2008).H. (2002). Shylakhter. S.. (2003).. Barker. W. Mary. CrossRef | PubMed 84 Misra. and Yudkin..D. (2009).. 87.H. C. 1047–1053.V. A. A. C... Global burden of diabetes.. Roglic. Morales. Disord. 173– 180. E.E.. Sicree.(2010). S. CrossRef | PubMed 83 Wild.. S. Clin. O.. (2001). Adiposity and hyperinsulinemia in Indians are present at birth.G. 19952025: prevalence. 7. Deshpande. Aubert. S. Murugesan. 497– 514. The Pune Maternal Nutrition Study. Coyaji. A composite of multiple signals distinguishes causal variants in regions of positive selection. Bull. A.. D. Hostetter. The metabolic syndrome in South Asians: epidemiology.. Byrne. Metab. Global prevalence of diabetes: estimates for the year 2000 and projections for 2030. Rao. J. S.. 893–898. M. and King. Green.. K.. Diabetes Care 31. S. Garber. High prevalence of diabetes and cardiovascular risk factors associated with urbanization in India. Frieden.R. and Ramachandran. CrossRef |PubMed 80 Yajnik. and Snehalatha. Syndr.S. 27. H. Endocrinol.

and Birmingham. Faridi. Clin.. Endocrinol. 86. PLoS ONE 2. 137–144. H. Chaiken. Insulin resistance and body fat distribution in South Asian men compared to Caucasian men. N.. Am.P. (2007). Kohli. 94. R.L. and insulin resistance in Asian Indian men.. R.G. Body composition. S. PubMed Publication Information Received: June 28. Metab. C.. 2011 Revised: September 6. Chockalingam.. 4696–4702. 2011 Accepted: November 12... Tchernof. K. E. Grundy. A..E. Kohli. A. Snell. S. S. 84. N...H.87 Banerji. Seenivasan. T. Bondy. (2007). J. visceral fat..M.. leptin.H. Nutr. P.A. e812. and Lebovitz.A. CrossRef | PubMed 88 Lear.. Livingston.. Frohlich. Endocrinol.L. Visceral adipose tissue accumulation differs according to ethnic background: results of the Multicultural Community Health Assessment Trial (MCHAT). Ethnic variation in fat and lean body mass and the association with insulin resistance. 2011 Published online: December 8. and Abate.A.. CrossRef | PubMed 90 Chandalia.D. (1999)... Atluri. A. P. M. Lin. PubMed 89 Lear. J. Clin. Humphries. Clin. 353–359. M. (2009). G. S.J. S. J. 2011 . and Sniderman.. Metab. J..

Sign up to vote on this title
UsefulNot useful