Professional Documents
Culture Documents
Department of Biological Sciences, Florida International University, University Park, Miami, FL 33199
Departamento de Genetica, Facultad de Biologa, Universidad de Vigo, Spain
3
Department of Medical Genetics, Sanjay Gandhi Postgraduate Institute of Medical Sciences, Lucknow 226014, India
2
KEY WORDS
ABSTRACT
The mtDNA composition of two Muslim
sects from the northern Indian province of Uttar Pradesh, the Sunni and Shia, have been delineated using
sequence information from hypervariable regions 1 and
2 (HVI and HVII, respectively) as well as coding region
polymorphisms. A comparison of this data to that from
Middle Eastern, Central Asian, North East African,
and other Indian groups reveals that, at the mtDNA
haplogroup level, both of these Indo-Sunni and IndoShia populations are more similar to each other and
other Indian groups than to those from the other
regions. In addition, these two Muslim sects exhibit a
conspicuous absence of West Asian mtDNA haplogroups
suggesting that their maternal lineages are of Indian
origin. Furthermore, it is noteworthy that the maternal
lineage data indicates differences between the Sunni
Present-day India is represented by a complex sociocultural mosaic comprised of 20 major languages and
*750 dialects (Kosambi, 1991) partitioned into *2,000
castes and tribal groups (Puppala, 1996). The vast majority of these ethnic populations (at least 80%) is Hindu
and socially organized into castes and subcastes (Karve,
1968). Tribal groups comprise about 8% of the total Indian population (Roychoudhury et al., 2001). A third
socio-religious group, the Muslims, which are represented by two major sects, the Sunni and Shia, constitute *12% of the total Indian population (Majumder,
2001).
The Sunni and Shia Muslim sects arose from a major
religious schism concerning the rightful lineage of the
Prophet Muhammads successor in the decades following his death (632 AD). The presence of these Islamic
settlements in India may be a result of at least three
distinct campaigns (Farah, 2003) initiated from different geographic regions. In 711 AD, an Arab military
invasion precipitated the formation of the Sind IndoMuslim state in the Indus delta region (Keay, 2000). A
few centuries later, between 997 and 1027 AD, Muslim
converts from the Central Asian Turkic tribe staged
multiple raids into the northwest province of Punjab.
Finally, during the 13th and early 14th centuries AD,
Afghan and Persian Muslims arrived from the northwest, reached New Delhi and from there, penetrated
into points east, west, and south (Wolpert, 1991). In
addition to different possible source populations, the
Indo-Muslim groups may have subsequently evolved
through several distinct cultural modes: cultural diffusion, elite dominance via military expansions, and colo-
C 2007
V
WILEY-LISS, INC.
1005
West Central
Levant
Near East
South Central
East Central
West Central
South
Island
Country
Population
Language
Social
status
Reference
34
118
Egypt
Egypt
Gurna
Arabs/Berbers
Afro-Asiatic
Afro-Asiatic
105
90
131
50
205
Oman
Qatar
UAE
Yemen
Central Asian
Afro-Asiatic
Afro-Asiatic
Afro-Asiatic
Afro-Asiatic
Altaic
388
192
139
39
69
117
50
451
42
37
20
17
216
41
44
100
44
38
33
44
Turkey
Armenia
Georgia
Jordan
Syria
Palestina
Arabia
Iran
Iran
Iran
Iran
Iran
Iraq
Turkmenistan
Tajikistan
Pakistan
Pakistan
Pakistan
Pakistan
Pakistan
Arabs
Arabs
Arabs
Arabs
Kirghiz, Kazakhs,
Uighurs
Anatolia
Armenian
Georgian
Arabs
Syrian
Palestinian
Arabs
Arabs
Persian
Gilaki
Kurdish
Lur
Iraqi
Turkmen
Shugnun
Karachi
Hunza
Brahui
Makrani
Parsi
Altaic
Indo-European
Caucasian
Indo-European
Indo-European
Indo-European
Indo-European
Indo-European
Indo-European
Indo-European
Indo-European
Indo-European
Indo-European
Altaic
Indo-European
Indo-European
Isolate
Dravidic
Indo-European
Indo-European
59
60
185
60
60
35
50
25
19
53
58
53
55
98
Uttar Pradesh
Shia
Sunni
Bhargava
Brahmin
Chaturvedi
Rajasthan
West Bengal
Rajbhansi
Bangladesh
Gujarat
Maharashtra
Karnataka
Kerala
Reddy
Thogataveera
Andra Pradesh
Sri Lanka
Tamil Nadu
Andaman
Indo-European
Indo-European
Indo-European
Indo-European
Indo-European
Indo-European
Indo-European
Indo-European
Indo-European
Indo-European
Indo-European
Dravidic
Dravidic
Dravidic
Indo-European
Indo-European
Indo-European
Dravidic
Andamanese
50
40
132
Rajasthan
West Bengal
Bangladesh
Gujarat
Maharashtra
Karnataka
Kerala
Andra Pradesh
Sri Lanka
Tamil Nadu
Andaman
Muslim
Muslim
Caste
Caste
Caste
Caste
Caste
Caste
Caste
Caste
Caste
Tribe
Caste
Caste
Tribe
Tribe
Caste
Tribe
Tribe
2004
2004
2004
2004
2004
2004
2004
2004
2004
2004
2004
Present study
Present study
Palanichamy et al., 2004
Palanichamy et al., 2004
Palanichamy et al., 2004
Metspalu et al., 2004
Metspalu et al., 2004
Palanichamy et al., 2004
Metspalu et al., 2004
Metspalu et al., 2004
Metspalu et al., 2004
Cordeaux etal., 2003
Metspalu et al., 2004
Palanichamy et al., 2004
Palanichamy et al., 2004
Cordeaux etal., 2003
Metspalu et al., 2004
Metspalu et al., 2004
Palanichamy et al., 2004
Collection of samples
Blood samples were collected in EDTA Vacutainer
tubes. The nuclear fraction from peripheral leukocytes
was isolated as previously described (Luis et al., 2004).
Ethical guidelines were adhered to as stipulated by the
institutions involved in the research project. The individuals from the Indo-Sunni and Indo-Shia communities
gave their informed consent prior to their participation
in the study.
1006
Haplogroup assignment
Genomic DNA was isolated from the peripheral leukocyte fraction of whole blood as previously described
(Agrawal et al., 2005). Both HVI and HVII regions were
PCR amplified using primers previously described
(Stoneking et al., 1991) and then sequenced and aligned
to the revised Cambridge reference sequence (rCRS)
(Anderson et al., 1981; Andrews et al., 1999).
Haplogroup classification is based on the sequence information from the mtDNA control region (HVI and nucleotide position 73 of HVII) and RFLP analyses as performed by Macaulay et al. (1999) assaying the restriction
status at the following sites: (4216 Nla III, 4830 Hae II,
5176 Alu I, 7025 Alu I, 9052 Hae II, 10397 Alu I, 10871
MnlI, 12308 Hinf I, 12629 Ava II, 12703 Mbo II, 14765
Acc I, 14766 Mse I). Additional studies (Kivisild et al.,
1999, 2003, 2004; Richards et al. 2000; Salas et al., 2002,
2004; Metspalau et al., 2004; Quintana-Murci et al.,
2004) were consulted to further delineate the haplotypes.
When feasible, the amplification protocol and primer sets
used to assay coding regions followed the procedure
described in Torroni et al. (1996). In some situations,
however, multiple bands resulting from the digestion of
monomorphic restriction sites present within the large
amplified fragment interfered with the determination of
restriction status on the agarose gel. Thus, for these
sites, we developed an assay involving internal primers
and much smaller amplicons (*200 nucleotides in
length). This strategy, referred to as reduced amplicon
RFLP, was very successful and resulted in easily discernable restriction digest patterns on the agarose gels. The
Baysean 0.95 credible regions (0.95 CR) are calculated
for each haplogroup by SAMPLING, a program kindly
provided by Vincent Macaulay.
Haplogroup frequencies
Statistical analyses
A principal component analysis (PC) [Numerical Taxonomy and Multivariate Analysis System or NTSYSpc2.02i by Rohlf (2002)] was performed using the haplogroup frequencies of the Indo-Sunni and Indo-Shia together with 44 previously studied populations from the
Middle East, Central Asia, North East Africa and India
(referenced in Table 1). Other statistical analyses
included the computation of standard diversity indices
and gene diversity scores (Nei, 1987) as well as an
EwensWatterson homozygosity test for selective neutrality (Ewens, 1972; Waterson, 1978). These procedures
were based on either the HVI alone (nps 1601916385)
or the HVI and HVII regions together (nps 16019
16385, nps 0007300400). In addition, an analysis of molecular variance (AMOVA) was conducted on haplogroup
frequency distributions of the extended data set including reference populations (Table 1) to assess population
genetic structure. These statistical procedures were executed using the Arlequin Version 2.0 package (Schneider
et al., 2000).
Reduced median networks of the relevant haplogroups
observed in the Indo-Sunni and Indo-Shia collections (M,
R, and U) were constructed by the NETWORK 4.1 software program developed by Fluxus Technology (www.
fluxus-engineering.com). Times to the most recent common ancestor (TMRCA) for mtDNA haplogroups were
estimated according to the methods of Forster et al.
(1996). As in previously published studies (Richards et
al., 2000; Salliard et al., 2000; Salas et al., 2002; Kivisild
Population
SHIA
SUNNI
Haplogroup
D
M3
M4
M5
M6
M9/M30
M25
M*
N*
R*
R5
R6
U
U7
X
M*
N*
R*
R6
U
0.95 credible
region
0.004
0.048
0.010
0.027
0.004
0.004
0.010
0.217
0.010
0.028
0.004
0.010
0.028
0.010
0.010
0.377
0.010
0.171
0.004
0.106
0.089
0.205
0.115
0.159
0.089
0.089
0.115
0.450
0.115
0.162
0.089
0.115
0.162
0.115
0.115
0.623
0.113
0.391
0.088
0.300
Frequencies
0.017
0.102
0.034
0.067
0.017
0.017
0.034
0.328
0.034
0.119
0.017
0.034
0.119
0.034
0.034
0.500
0.034
0.269
0.017
0.180
et al., 2004), the average mutation rate used is one nucleotide change in 20,180 years for nps 1609016365.
RESULTS
Haplogroup distribution
The resulting haplogroup frequencies along with the
Baysean 0.95 credible regions (0.95 CR) of the 119 individuals comprising the two Muslim Indian populations
are presented in Table 2 (individual haplotype information for the Indo-Sunni and Indo-Shia is provided in online
Tables 1a and 1b at http://http://www.fiu.edu/*herrerar/
Sunni_and_Shia_Haplotype_frequencies.xls). This table,
along with the phylogeographic map of Figure 1 (based
on the haplogroup frequencies of the two studied populations together with those of the reference studies), illustrate that the predominant mtDNA haplogroups in the
Indo-Sunni and Indo-Shia are M (0.500, 0.95 CR: 0.377
0.623 and 0.593 0.95 CR: 0.4650.709, respectively), R
(0.317, 0.95 CR: 0.2130.443 and 0.169, 0.95 CR: 0.095
0.285, respectively), and U (0.150, 0.95 CR: 0.0820.262
and 0.153, 0.95 CR: 0.0830.266, respectively). These frequencies are similar to that of the other Indian populations surveyed (Fig. 1). Especially noteworthy is the high
frequency of haplogroup M in India (0.549 overall) perhaps reflecting its ancient status within this subcontinent (Quintana-Murci et al., 1999). However, as seen in
Table 2, there are some differences in the M haplogroup
distribution between the Indo-Sunni and Indo-Shia. The
Indian-specific M sub-haplogroups of M3, M4, M5, M6,
and M25 (Metspalu et al., 2004) are present in the IndoShia [0.102 (0.95 CR: 0.0480.205), 0.034 (0.95 CR: 0.01
0.115), 0.067 (0.95 CR: 0.0270.159), 0.017 (0.95 CR:
0.0040.089), and 0.034 (0.95 CR: 0.010.115), respectively] but not in the Indo-Sunni.
Several uncommon mutations (16007; 16009; 16008;
16025; 16095) are detected (sometimes in combination
1007
Fig. 1. Phylogeographic map of India (and other regions). This phylogeographic distribution is based on the geographical frequency distribution of the major mtDNA haplogroups (D, M, N, R, U, X and others) with N and M including only those samples not
subsumed by their respective sub-haplogroups (i.e. M: D, N: R, U and X, R: U).
1008
Fig. 2. Principal Component Analysis. Middle East: Persia (PER), Iran (IRA), Gilaki (GIL), Lur (LUR), Kurdish (KUR), Iraq
(IRQ), Arabia (ARA), Syria (SYR), Palestine (PAL), Anatolia (ANA), Jordan (JOR). Lower Arabian Peninsula: Oman (OMA),
Qatar (QAT), United Arab Emirates (UAE), Yemen (YEM). Central East Asia: Georgia (GEO), Central Asia (CAS), Turkmen,
Turkmenistan (TUR), Armenia (ARM), Shugnun, Tajikistan (SHU). India: Shia (SHI), Suni (SUN), Andaman (AND), Andhra Pradesh (APR), Bangladesh (BAN), Bhargava (BHA), Brahmin (BRA), Chaturvedi (CHA), Gujarat (GUJ), Karnataka (KAR), Kerala
(KER), Maharashtra (MAH), Rajasthan (RAJ), Rajbhansi (RAB), Reddy (RED), Sri Lanka (SRI), Tamil Nadu (TAM), Thogataveera
(THO), West Bengal (WBE). Pakistan: Karachi (KCH), Hunza (HUN), Makrani (MAK), Brahui (BRH), Parsi (PAR). North Africa:
Gurna (GUR), Egypt (EGY).
1009
Grouping
Fst
0.308
0.066
0.298
0.121
0.071
Among Groups
f
P-value
24.68
4.13
8.03
0.63
2.70
0.00000
0.00000
0.00782
0.27370
0.01271
Among populations
within groups
f
P-value
Within populations
f
P-value
4.22
3.43
21.80
11.56
4.46
71.10
92.44
70.18
87.81
92.84
0.00000
0.00000
0.00000
0.00000
0.00000
0.00000
0.00000
0.00000
0.00000
0.00000
Shia
HVI
HVII
HVI
HVII
50
41
0.0304
0.0291
0.9020
0.8980
0.9894 6 0.0071
60
57
0.0183
0.0184
0.8720
0.8720
0.9983 6 0.0034
60
51
0.0238
0.0227
0.9480
0.9440
0.9927 6 0.0052
55
48
0.0267
0.0235
0.9980
0.9980
0.9912 6 0.0070
0.998 and 0.991 for the Indo Sunni and the Indo-Shia,
respectively). Also, as shown in Table 4, the Indo-Muslim
populations yielded nonsignificant selection values for
the Ewen-Watterson neutrality test.
DISCUSSION
The Indo-Sunni and Indo-Shia enclaves in North Central India practice the Islamic religion and culture and
are thus, culturally distinct from the neighboring Indian
castes and tribal groups. However, the question remains
as to whether the Arab-based cultural elements of these
two North Indian Muslim groups are reflective of their
primary genetic affinity. In other words, do the Sunni
and Shia from North Central India share a higher
degree of mtDNA similarity with the groups from the
Middle East, where the Islam religion originated (7th
century AD in Saudi Arabia) and flourished? Alternatively, is the mtDNA composition closer to populations
from Central Asia (stronghold of the Mongol warlords
who had adopted the Islamic faith) or the surrounding
tribal and caste Indian groups? To answer these questions, mitochondrial DNA sequence information (HVI
All
Sunni
Shia
81.9 6 16.7
74.0 6 9.2
69.8 6 11.1
65.6 6 12.7
57.7 6 9.9
77.8 6 15.3
34.8 6 10.8
43.7 6 13.0
73.0 6 18.4
and HVII regions) and mtDNA coding polymorphism status was obtained for 119 unrelated Indo-Sunni (60) and
Indo-Shia (59) individuals.
1010
M haplogroup in Western and Central Asia (Pakistan excluded). Noteworthy is the substantial overlap between
the TMRCAs of the Indo-Sunni (65,600 6 12,700 ybp)
(Table 5) and other Indian M lineages (53,000 6 7,000
ybp in Quintana-Murci et al. 1999) which indicates a
similar degree of M lineage diversity and age. Also,
these coalescent times are close to that of the initial out
of Africa dispersal of modern humans (54,000 6 8,000
ybp in Forster et al. (2001).
The somewhat younger TMRCA of the Indo-Shia M
haplogroup (34,800 6 10,800 ybp) (Table 5) may reflect a
more recent penetration into the Indian subcontinent.
Alternately, the younger TMRCA may also be due to a
lesser degree of admixture with the neighboring Indian
populations. However, the significant levels of Indian
specific types, M3, M4, M5, and M25 (0.237% 0.95 CR:
0.1470.360, v2 P-value < 0.0005) in the Indo-Shia versus their absence in the Indo-Sunni suggests otherwise.
A third possibility in which a recent episode of random
genetic drift reduced the variability of the M sub-haplogroups in the Indo-Shia population (thus, resulting in
a younger TMRCA) is discounted by the equally high
gene diversity indices of the Indo-Shia (0.9927 and
0.9912 for HVI and HVI and HVII combined, respectively) versus the Indo-Sunni (0.9894 and 0.9983 for HVI
and HVI and HVII combined, respectively).
The substantial R haplogroup levels in both the IndoSunni and the Indo-Shia (0.317%, 0.95 CR: 0.2130.443
and 0.169%, 0.95 CR: 0.0950.285, respectively) (Table
2) resemble those of the surrounding Indian groups (Fig.
1). In contrast, there appears to be a dearth of this haplogroup elsewhere (Africa, Europe, Central Asia, North
East Africa, and East Asia) except for some very low levels (3%) in Oman, Persia, Qatar, UAE, and Yemen (AlZahery et al., 2003; Quintana-Murci et al., 2004; Rowold
et al., 2007).
The alignment of the Indo-Sunni and the Indo-Shia
with other Indian factions is also supported by the lack
of typical West Asian haplogroups such as H, I, J, K,
and T (Richards et al., 1998; Richards et al., 2000) in the
Indo-Muslims. Furthermore, the extremely low levels of
haplogroups U7 and W in the Indo-Muslims (two haplogroups common in Iran and Pakistan and/or polymorphic in the far northwest region of India) does not indicate substantial maternal admixture from North and
West Asia. These findings are also reflected in the segregation of populations in the PC plot (Fig. 2) and corroborated by the lack of specific lineage sharing (differing in
at least four mutations) between the Indo-Sunni and the
Indo-Shia Muslims on one hand and six Middle East
Asian or North East African populations on the other
(Jordan, Oman, Qatar, United Arab Emirates, Yemen
and Egypt; from Rowold et al., 2007).
SUMMARY
This study presents several notable findings. With
respect to mtDNA haplogroup distribution, both the
Sunni and Shia populations of Uttar Pradesh display a
much higher affinity to indigenous Indian populations
than to Middle Eastern and Central Asian groups. In
addition, the Indo-Shia displays a significantly higher
level of the Indian specific M sub-haplogroups (v 2 Pvalue < 0.001). The Indo-Sunni, on the other hand,
exhibits a significantly greater R frequency (v 2 P-value
< 0.025), which, like the Indian specific sub-haplogroups,
is also more common in India but not necessarily related
to social status (upper caste, lower caste, and tribal)
(Kivisild et al., 2003). In contrast, the available Y chromosome data suggests that the Indo-Shia population
exhibits higher Middle East Asian and lower North Indian Y chromosomal components compared with that of
the Indo-Sunni. Taken together, the mtDNA and Y chromosome data add support to the theory that the current
Sunni and Shia communities of Uttar Pradesh in northern India are mostly descended from Hindu converts.
For the Indo-Shia, however, the paternal haplogroup signature suggest a greater foreign Y chromosome contribution, possibly from the East or Central Asia, compared
with mtDNA. These results corroborate a scenario in
which mixed marriages in the past between Muslim
males and Hindu females engendered the majority of the
existent Muslim inhabitants.
Overall, our data suggests that the Y chromosome and
the mtDNA haplogroup composition of the Indo-Sunni
align this group closely to the neighboring Indian populations. MtDNA haplogroup differences between the
Indo-Shia and Indo-Sunni do not signal a differential
contribution of foreign mtDNA to either sect. Yet, they
may be indicative of unique population dynamics with
native castes and tribes.
LITERATURE CITED
Agrawal S, Khan F, Pandey A, Tripathi M, Herrera RJ. 2005.
YAP, signature of an African-Middle Eastern migration into
northern India. Curr Sci 88:19771980.
Al-Zahery N, Semino O, Benuzzi G, Magri C, Passarino G, Torroni A, Santachiara-Benerecetti AS. 2003. Y-chromosome and
1011
1012