ESTIMATING LEVELS OF GENE FLOW IN NATURAL POPULATIONS

MONTGOMERY SLATKIN
Department of Zoology, NJ-15, University of Washington, Seattle, Washington 98195
Manuscript received October 14, 1980
Revised copy received June 19, 1980
ABSTRACT

The results from a simulation model of selection, mutation and genetic
drift in a geographically subdivided population are presented. The infinitealleles mutation model of KIMURAand CROW (1964) is asumed, and both advantageous and deleterious mutations are considered. It is shown that the
average frequency of an allele conditioned on the number of local populations
it appears in-the conditional average frequency-is approximately independent of both the selection intensity and mutation rates assumed, but depends
strongly on the overall level of gene flow. This result justifies the use of the
conditional average frequency to obtain a rough estimate of the level of gene
flow in a subdivided population. Data from 16 species are presented and discussed. There are large differences in the conditional average frequencies of
different species, although there is some consistency within taxa. Some species
apparently have high levels of gene flow and others, particularly salamanders,
have low levels. Alternative explanations f o r the patterns found in the data
are considered.

importance of gene flow in evolution is not fully agreed upon, akhough
Tr:ariety
of strong views are held. One view, represented by MAYR(1963)
and STANLEY(1979), is that gene flow is common and that a small amount of
gene flow among different parts of a species’ range effectively unifies the species
and affects significantly the genetic changes in each part of the range. The
other view, represented by EHRLICHand RAVEN(1969), is that gene flow is
uncommon and that natural selection acts more or less independently in each
part of a species’ range. While both of these views are probably correct for some
species, there is no way to determine the importance of gene flow in natural
populations because there is no direct way to estimate levels of gene flow. Indirect
estimates of the level of gene flow obtained by measuring the movement of marked
individuals are difficult to interpret. Levels of gene flow can be severely underestimated through the movement of unmarked individuals o r unmarkable life
stages or severely overestimated because individuals may move but not breed.
Also, marking itself may affect dispersal. Direct estimates using visible mutants,
as obtained by BATEMAN
(1947a,b), are extremely difficult to obtain in natural
populations.
In this paper, I will present a new method for using allele frequencies in natural
populations to estimate the overall level of gene flow. This method is related to
that of WRIGHT,DOBZHANSKY
and HOVANITZ
(1942) and YOKOYAMA
(1979) in
Genetics 99: 32-335 October, 1981.

chosen at random from the demes from which it received migrants. THE SIMULATION MODEL The simulation model is a direct descendent (with modification) of the models of SLATKINand CHARLESWORTH (1978) and SLATKIN( 1 9 8 0 ) . For the steady flux model. neutral alleles model. numbered 1 to d . each allele had associated with it a fitness. To simplify the specification of the parameters of the model. mutation and gene flow. I will first present the simulation model and its predictions and then the patterns in allele frequencies found in electrophoretic surveys of natural populations. Each generation consists of four steps: mutation. the model of uniform overdominance introduced by KIMURAand CROW (1964. A population of a diploid species is assumed to comprise d demes. The principal difference is that the model discussed here is of a locus with mutations to a potentially infinite number of alleles. were equal. Thus. The geometry of the population was determined by a list of demes from which each deme received migrants. there was no upper or lower bound to the relative fitnesses of the alleles. there could be a steady decrease. but in some cases the Ni were chosen from a Poisson distribution with mean R with the additional constraint that N . there is underdominance. each mutation is to a distinct allele and every heterozygote has a fitness of ( 1 s) relative to every homozygote. and if s < 0. The fitness of a diploid individual was the product of the fitnesses of its two alleles. This selection model is discussed in a later paper (SLATKIN submitted). but unlike those methods. The method described below is based on computer simulations of the combined processes of genetic drift. The gene flow among demes depended both on the geometric arrangement of demes and on the migration rates assumed for each deme. neutral alleles model of KIMURAand CROW(1964) is a special case of this selection model with s = 0. all the m. In the second selection model.3244 M. + + . all of the N . If s > 0. I n other cases. There were two models of natural selection considered. deme i was assumed to have a fraction l-mi of its gametes derived from itself and a fraction m . Therefore. I n the simplest case. data from different loci can be combined t. 2 2.1. When an allele of fitness w mutated to a new allele. natural selection. gene:ic drift. I n most of the simulations.0 form a single overall estimate of the level of gene flow. with deme i being of effective and actual size N i . are equal and there is a single parameter specifying the level of gene flow. If s = 0. the m. this method requires no special assumption about the selection acting. there is overdominance. SLATKIN that it uses similarities of allele frequencies in different populations. this model is also the infinite. the steady flux model with no dominance and the uniform Overdominance model. If s > 0. there could be a study increase of the average fitness. where s is a parameter specified at the beginning of the simulation. gene flow and natural selection. at the beginning of a simulation were chosen from a normal distribuiion with a specified mean and variance. the fitness of the new allele was w ( 1 s ) . and if s < 0. The infinite. which depend on recessive lethal alleles.

That seemed to be sufficient time for approximately stationary conditions to be reached. N or R . (4) the selection model and selection coefficient. For s < 0. which determined the geometry. the following parameters were specified: (1) the number of demes. Then the simulation proceeded for a number of generations equal to the inverse of the mutation rate before any information was accumulated. and found that it depended on both the level of gene flow and the geometric arrangement of demes. In most cases. the dependence of p ( i ) on the selection coefficients was determined by the deme size. For the infinite-alleles model. For each simulation. the simulations continued for a t least ten times the initial waiting period but sometimes much longer. I found that the average frequency of a mutant allele computed for different occupancy numbers was approximately independent of the selection acting on the mutant. The results from the two-allele model that are relevant for the present discussion are as follows. information was accumulated only a t specified intervals (every 50 generations). I found the same pattern in results from the infinite-alleles model as in those from the two-allele model. larger values of s produced the same pattern. The intensity of selection in Figure l a is larger than suggested by the values of Ns. However. and Vm. p (i) . s. and the links among the demes. Because of the gene flow. At the beginning of each simulation. the entire population as fixed for one allele.. p ( i ) was computed for all alleles in the population present in the population each time information was accumulated. not on the sample size. The pattern in p ( i ) did not differ for strongly dele- m . ( 3 ) the deme size. ( 5 ) the mutation rate. (with the constraint that all mi > 0). d . Waiting longer before starting to accumulate information did not significantly alter the results. I defined the occuprmcy number of an allele to be the number of demes in which it is present. including underdominance and overdominance. In SLATKIN [ 1980). or the average deme size. I found that if small samples of alleles were taken from larger demes. The weak dependence of p [ i ) on the selection coefficients is illustrated in Figure l a . those results did serve as a guide to analyzing the infinite-alleles simulation model described above. For s > 0. Those results could not be applied to data because they were found f o r only a two-allele model in which one of the alleles was known to be the mutant. The results from these simulations were anticipated from those for the corresponding simulations of a two-allele model. (2) the mean and variance of the migration rates. Moreover. There was some slight dependence of p [ i ) on the selection. JL. but the dependence was less for larger deme sizes. Selection in each deme was modeled by deterministically modifying the gamete frequencies.GENE FLOW I N NATURAL POPULATIONS 325 m and V. a more appropriate measure of selection intensity is dNs. That was true for mutants with different degrees of dominance. larger values of is1 resulted in too few or no alleles reaching intermediate occupany numbers. I called this frequency the conditional average frequency. Random mating was simulated by randomly choosing 2Ni alleles to form the next generation. To make the program somewhat more efficient.

01 = -0.02 = -0.25. The actual number of samples varied among replicates. m = 0.01 = 0. as shown in Figure 2a. p ( i ) did depend on the migration rate..25 0..01 0.0001 4 2 = 0.m" = 0.00005 b e.000 generations in a replicate and 801 samples.000025 0.75 F(i) 4 2 0 +-+.0001 -0. There were at least 40. p ( i ) for different simulations of an island model.75 0. with information accumulated every 50 generations. The effect of the geometric arrangement of demes was also the same as f o r the two-allele model. ( d = 10 and N = 25) Part a: p = 10-4. Figure I d shows that the results are sufficiently reproducible that it is reasonable to use the time average to estimate the expected value of jj(i). s = 0. each curve is the average of the p (i) over a single replicate. and variation in migration rates made little or no difference (Figure IC).5 .0 Vm = Vm = 0.o 0. j . The dependence of p ( i ) on the mutation rate was weaker than the dependence on the selection coefficients (Figure lb) .25 i 0 q 0 - = 0.005.-Conditional average frequency. Similar results were found for the case of uniform overdominance. - 1 o. Vm j 6 8 10 o. parts a. terious mutatioiis.. only that no pattern a t all could be found by the simulation method.01. Figure I d shows the results for three replicates each of two cases with different parameter values. m = 0.0 *0. 6 _i m = 0..n 0. As anticipated. More restricted migration patterns such as a one-dimensional stepping-stone model produced the same for of the P ( i ) as the island model with .. s = 0.01.+.00001 = 0.5 0. While there is more variation in the p ( i ) for s < 0. m 8 10 8 10 *0.5 0.02 5 0 0.326 M.01. 1 - * * 0. b and c.01 (steady flux selection model) Part b: m = 0. SLATKIN 1 P(i) 0.01 = r0. -0 +-+ B *-* * * *+ Q---o m" = 0. I n Figure 1. Part d: p = s = 0. for populations with the Ni chosen from a Poisson distribution and for models in which the selection coefficient associated with each allele was randomly chosen.25 - 1 a 2 4 1 6 8 10 0 2 4 i 6 FIGURE 1. Part c: p = 10-4.

p = IW4.01.~ m = 0.-0 m = 0 . Part a: Island model Part b: One-dimensional stepping-stonemodel ..5 0. for neutral alleles. 0 .75 P( i) 0.001. p ( i ) . d = 10 and N = 25.. A-A m = 0.05..-Conditional average frequency. 0.327 GENE F L O W IN N A T U R A L P O P U L A T I O N S 1 1 r 0. *---A m = 0.25 0 0 2 1 6 8 10 ' 6 0 2 416 8 10 FIGURE2.1. V--V m = 0. 0 0 5 . In both parts.

In all cases but one. submitted). the pattern is the same as found using a single technique (Figure 3a). using a statistic derived from P ( i ) . Since a change in occupancy number occurs only when alleles are carried by migrants to demes where the allele is absent. Figure 3 shows the results of analyzing several data sets available in the literature on allele frequencies in subdivided population in samples from different parts of species’ ranges (see also Table 1) . That probability. SINGH). what would be desirable is missing. the patterns in the data are sufficiently clear that it is possible to distinguish between high. No special assumptions about either the selection o r the population structure are necessary. does depend strongly on selection intensity and mutation rate. as well as on the migration rate.328 M. it is the set of frequencies and the migration rates that determine the likelihood of a change in occupancy number. Selection and mutation can change frequencies in each deme. should not be confused with the probability. but for the one case in which allele frequencies were estimated using a combination of electrophoretic techniques (the data on Drosophila pseudoobscura kindly provided by R.that an allele is found in different numbers of demes. allele frequencies were estimated using a single electrophoretic method. One way to understand why that is so is to note that the probability that an allele will be carried by a migrant depends only on its frequency in a deme and not on the mechanisms that caused the allele to have that particular frequency. the differences between high and low . and at this time we cannot do much more than look for a qualitative resemblance between the simulations and the data. it is the value of the Nm rather than m alone that determines the importance of gene flow. an analytic theory that would allow the estimation of the true or effective migration rate. This technique is known to underestimate the true number of alleles for some loci at least. As expected and as was shown in SLATKIN (1980). PATTERNS IN T H E DATA The preceding results justify the use of P ( i )to estimate the overall level of gene flow in natural populations. two comments on the possible bias introduced by sampling are appropriate. lij ( i ) . I could find no such estimator either from an analytic model or from a statistical analysis of the simulation results. but do not have a significant direct effect on the occupancy numbers. almost all samples were of 10 or more individuals. A full statistical analysis of the simulation results is not possible. intermediate and low levels of gene flow in different species. SLATKIN a smaller migration rate (Figure 2b). An illustration of how the F (i) were computed is given in Table 2. The conditional average frequency. Unfortunately. I show one way in which this information I n a later paper (SLATKLN can be used. even though the actual rates of migration cannot be estimated. First. I found in the simulations that. in all the data shown in Figure 3. It is not obvious why p ( i ) should depend only on the migration rate. if samples of at least that size were used. However. which I called the occupancy distribution (SLATKIN 1980). However.

5 0.25 0 0 0 0. the results are more like the curves based on the data.8 1 I I 0 0. If only a few samples are taken from the simulations.romy.romy.lhodon L. the smooth form of simulation results shown in Figures 1 and 2.2 0. . medium and low levels of gene flow.329 GENE FLOW IN NATURAL POPULATIONS 1 1 0.6 0. 0. are due to the large number of generations for which the simulations could be run. where d is the number of locations sampled.c. Q---D P.8 1 FIGURE 3.25 - 0 0 - 1 1 Thonomy.su.-Conditional average frequencies for various species.4 i/d0. to permit comparison of different species.E 1 I I I 0. as contrasted with the less than smooth form of the curves in Figure 3. migraiions rates would be apparent. (i) is poltted against i / d . Second. " .5 - 0.2 0. HyIe r e p l l l e 0. (See Table 1 for details on each specie?. I I .4 i/d 0 0.cu.4 E(i) 0.6 I 0 0.75 b(i) 0. While the limitations of the electrophoretic method should be borne in mind.75 +--e P.5 0. the differences between species are large enough that it would be unreasonable to attribute them completely to sampling artifacts.2 0.4 i/d0. L .2 Q---Q PI.8 1 0.6 0.6 O.r1a i/d 0.2 0. n. boll#. c a 1 .1 L .4 i/d 0.25 0. Results f o r species in the high levels of gene f l ~ ware consistent with the . T o r n 15 p o l LonoL".) In each case.8 1 oueshlta.6 0. 0. Uneven spacing of points on a curve is due to the absence of alleles with some of the intermediate occupancy numbers. The results are grouped into three categories: species with apparently high.

SLATKIN .330 M.

0 v 3 ‘R a 5 I 1 1 +8 1 I I I1221 0 0 0 0 0 1: 1 %0. 0 0 0 0 I h h v ‘R I I lggl 0 0 !I s 01 * M ” 0 0 0 0 W m h w .33 1 GENE F L O W IN N A T U R A L P O P U L A T I O N S II .m m * 1 0 0 ~ 0 0 0 0 0 h M I I I ~ 0l 8 0 0 0 I I 0 .--.

p ( i ) for Drosophila pseudoobscura in the western United States is similar to that for D. I n comparing Figures 3a and 3b. it is notable that p (i) are similar for different species in the same genus or in ecologically similar genera. While these results are hardly conclusive. DISCUSSION The qualitative agreement between the simulation results and the data does not necessarily imply that natural populations actually conform to the assumptions of the simulation model. two in the genus Plethodon in the eastern United States and two in the genus Batrachoseps in California. If the same selection model is assumed in each deme. While it is possible that the detection of much more genetic variability using different biochemical techniques could drastically alter the patterns shown in Figure 3. is x/[1. particularly salamanders. the view of EHRLICH and RAVENmay be more correct. and the expected frequency. If it is assumed that an allele is maintained at frequency x by selection and mutation pressures. For such species. they do provide a new way to interpret data from natural populations and some hope that levels of gene flow can be estimated without making severe or restrictive assumptions about the species being studied. These results provide some evidence that levels of gene flow in natural populations differ greatly among species. Despite the limitations of the electrophoretic methods. I n such species. Other explanations of the data must be examined. There are at least two obvious alternative models that could lead to predictions about the form of p ( i ) curves. given that it is found in the sample. SLATKIN simulation results f o r an island model with equal deme sizes and equal migration rates for N m > 1. Both appear to have high levels of gene flow. migration then plays the role of mutation in introducing new alleles to a local population.pseudoobscura and the consistency of the patterns f o r different species suggest that the qualitative features of the results will not be changed. p ( i ) can be determined directly. but otherwise does not alter allele frequencies in local populations. For other species.332 M. is based on the assumption that there is a balance between selection and mutation in each deme independently and that gene flow among the demes is either nonexistent or too weak to produce any effect on the selection-mutation balance. The salamanders. Some species have sufficiently high rates of gene flow that they may be nearly panmictic.( 1 -x)?"]. The results for species with low levels of gene flow are consistent with the same model with N m < 1.willistoni in Venezuela. the independent deme model. the view of MAYR that the species is effectively a single evolutionary unit may be and STANLEY valid. For those species. the conditional average frequency. The first. have either IOWor intermediate levels of gene flow. then the probability that it will be found at all in a sample of n individuals ( 2 n alleles) is 1 ( 1 .z) 2n. . the levels of gene flow are sufficiently low that natural selection and genetic drift may occur more-or-less independently in each deme. SINGH'S (1979) results for D. Gene flow may not be responsible for genetic or morphological uniformity of such species and other causes must be sought.

I n the high migration species there are numerous alleles that have intermediate occupancy numbers with very low frequencies ( c f .0005. inhabits islands in the Adriatic.5 or greater. Finding an allele in high frequency in one deme would not imply that it should be present in high frequency in every other deme.076. Yet the data show many such alleles. For example if n = 25. even though any distribution of x can be specified. there seems to be no way to distinguish between low levels of gene flow that is still occurring and no gene flow. If 10 demes are sampled. and GILL(1979) has examples of long-distance movement of individuals. then the probability that the allele will be present in a sample from any deme is also small. Other types of information could possibly be used to distinguish between the two alternatives. the probability of finding even one allele in five or more of the demes sampled is less than 0.[l. then any data can be explained on an ad hoc basis.99999. is assumed that 5 is different in every deme. there should be no alleles with intermediate occupancy numbers with average frequencies of 0. I n low migration species. Although actual rates of movements of individuals are unknown.5)20]ld. even though the sample sizes are often much larger than IO. It cannot predict that an allele can be found in some but not all demes in high frequencies. then the probability that the allele will be found in all demes For d = 10 this probability is in samples of at least size 10 is 1 . Therefore. which is typical of the sample sizes for the high migration species.5. However.95 allele of LAP in Table 2). On the other hand. There are too many alleles found with intermediate occupancy numbers and low frequencies. Alleles present in the initial population could either be lost or become common due to genetic drift. For example. in which a single deme at some time in the past radiated into several demes that do not now exchange migrants.(0. Of course. the patterns found in the low migration species could be explained by the radiation model. rather than to continued gene flow among the islands. in salamanders there are n o apparent barriers to migration. The similarities between demes could either be due to presently acting forces or to remnants of a previous association. The second model that could explain the pattern in the data is the radiation model. 0. The analysis of this model is complex.05. that probability is 0. it seems reasonable that similarities between populations on different islands is due to their common descent from the same mainland population. If x = 0. and many of them have lower frequencies in the demes in which they are found. The independent deme model is not adequate to describe the data. the lizard Lacerfa meliselZensis (Figure 3d). the 0. as is required by the independent deme model. say 0. The reason is the same as the reason for rejecting the independent deme model. In the independent deme model. if x is small. It seems . if i’r. The independent deme model also cannot explain the patterns shown in the “high migration species” (Figures 3a and 3b). The radiation model could not explain the patterns found in high migration species unless it were assumed that the radiation took place recently. Yet many such alleles are found.GENE FLOW IN N A T U R A L P O P U L A T I O N S 333 The independent deme model cannot explain two features of the data. It is unlikely that a n allele would remain at low frequency in several demes independently. but could be carried out by the same methods as used by NEIand LI (1975).

D. 235-246. BATEMAN. 31-43.A. This research has been supported by the National Science Foundation (DEB-7827045). and W.W. HIGHTON. M. 26. even for weakly selected or neutral loci. C. RAVEN. 7103. WEBSTER. I thank J. MAYR.. S.1969 Differentiation of populations. . D. Plethodontidae).. J. New York. Science 165: 1228-1232.L. YANG. GENTRY. J. Harvard University Press.M. H. KIMURA. 1976 Genetic variation and divergence i n populations of the salamander Plethodon cinereus.R. WAKEand A.M. LARSON.1975 Evolutionary genetics of insular Adriatic lizards.. L. and T. SLATKIN unlikely that there is no gene flow whatever between local populations. Zool. GORMAN. G. YANGand E. TRACEY. Univ. Genetics 49:725-738.G. Ouachita Mountains.. LI. K. Zool. GOTTLIEB. NEI. LITERATURE CITED AYALA. Cambridge. and R. Res. H. B. Contamination of seed crops. Evolution 31 :697-720. P. 1963 Animal Species and Euolution. J. Genetics 48: 25719471. Insect pollination. 1978 Geographic protein variation and divergence in the salamanders of the Plethodon well& group (Amphibia. SELANDER. WAKE.J. HIGHTON. HANELINE Hyla. R.S.. Y. Syst. KOEHNfor providing copies of their data for analysis.. K. 11. L. F.. In either case. B. I also thank D. Mass. gene flow is not strong enough to prevent divergence of local populations. GILL. K.S. Texas Publ. Genetics 7 7 : 343-384. 1975 Probability of identical monomorphism in related species. Studies in Genetics VI. and P. F. Variation in the old-field mouse (Peromyscus polionotus). CROW.1977 Genetic variation in Thomomys bottae pocket gophers: macrogeographic patterns. L. Heredity 1: 275.. and R. . B. KIRKPATRICK for numerous helpful discussions of this topic and T. SOUL^. R. P~TEZ-SALAS. 1974 Genetic variation in natural populations of five Drosophila species and the hypothesis of selective neutrality. SINGH.1964 The number of alleles that can be maintained in a finite population. -.R. F. The evolutionary implications of the two explanations of the data for low migration species are nearly the same. FELSENSTEIN and M. 1975 Protein variation in several species of CASE. J. Wind pollination. 24:281-295. D. YANEV. 1947a Contamination of seed crops. 1979 Density dependence and homing behavior in adult red-spotted newts Notophthalmus viridescens (Rafinesque) . Syst. BARR. SMITH. 1974 The Genetic Basis of Evolutionary Change. LARSON for comments on an earlier version of this paper. JOENSON and J.P. 1975 Allelic diversity in the outcrossing annual plant Stephanomeria ezigua ssp. P..R. HIGHTON.334 M.. M.E. 1979 Genetic relationships of the eastern large Plethodon of the DUNCAN. Evolution 29: 52-71. Ecology 60 : 800-813. I. LEWONTIN. E. A. R. C. Copeia 1979:95-110. NAGYLAKI. and I have been supported in part by a Public Health Service Research Career Development Award (K01-GM00118). EHRLICH. and J. SMITH. carotifera (Compositae) Evolution 29: 213-225. Y. 27: 431448. M. PATTON. and YANG. Camb.1971 Biochemical polymorphism and systematics in the genus Peromyscus. MCDONALD and S. F. E. R. Columbia University Press. NEVO.. H. G. Genet. M. and M. I. Evolution 30: 33-45.

San Francisco. Genetics 27: 363-394. dissertation. Mamm.K. SMITH. 1979 Macroevolution: Pattern and Process. M. 1978 The spatial distribution of transient alleles in a subdivided population: a simulation study. 194-2 Genetics of natural populations. 60: 705-722. STANLEY. B.1981 Genic differentiation in a relict desert salamander.R. Berkeley. Ph. EWENS . VII. S.G. Genetics 93 : 997-1018. M. 1979 Studies heterogeneity between within electrophoretic “alleles” and patterns of variation among loci in Drosophila pseudoobscura. TH.K. YOKOYAMA.. 1979 The rate of allelism of lethal genes in a geographically structured population. S.. DOBZHANSKY and W. 1979 Geographic variation in genic and morphological characters in Peromyscus californicus. F. Univ. HOVANITZ. Freeman. J. Corresponding editor: W. YANEV. Evolution 34: 558-574. WAKE. SLATKIN. CHARLESWORTH.. YANEV. M. Batrachoseps campi. 1980 The distribution of mutant alleles in a subdivided population. WRIGHT. P. Biochemical evidence.GENE FLOW I N NATURAL POPULATIONS 335 SINGH... of California. SLATKIN. and D. 1980 Geographic variation in the milkfish Chanos chams. Genetics 89 :793-810. S.. Herpetdogica 37: 16-28. and D.M. WINANS.. F. J. Genetics 9 5 : 503-523..D. P. The allelism of lethals i n the third chromosome of Drosophila pseudoobscura. Genetics 93: 245-262. I. 1978 Evolutionary studies of the plethedontid salamander genus Batrachoseps.