You are on page 1of 57

Supplementary Materials for

Dual domestications and origin of traits in grapevine evolution

Yang Dong et al.

Corresponding authors: Shaohua Li, shhli@ibcas.ac.cn; Jun Sheng, shengjun@dongyang-lab.org; Wei Chen,
wchenntr@gmail.com

Science 379, 892 (2023)


DOI: 10.1126/science.add8655

The PDF file includes:

Materials and Methods


Supplementary Text
Figs. S1 to S27
References

Other Supplementary Material for this manuscript includes the following:

Tables S1 to S35
MDAR Reproducibility Checklist
Materials and Methods

VS-1 genome assembly. The V. sylvestris plant VS-1 of Tunisian origin (DVIT2426) was obtained
from the grape germplasm and breeding block of the Shanghai Jiaotong University in Shanghai.
Fresh young leaves were collected for the extraction of total genomic DNA using the CTAB Plant
DNA Extraction Kit (Genenode Biotech Co, Beijing). We obtained 49.5Gb (~100´) PacBio single-
molecule real-time (SMRT) reads and 26.7Gb (~54´) circular consensus sequencing (CCS) reads
on the PacBio RS II platform from BGI-Wuhan (Wuhan, China) and Berry Genomics (Beijing,
China), respectively. We also obtained a total of 170.67Gb (~350´) Illumina paired-end
sequencing data and 62.44Gb Hi-C sequencing data from Novogene (Beijing, China) (table S1).

NextDenovo (v.2.0.beta.1; https://github.com/Nextomics/NextDenovo/; Accessed Dec. 27th,


2019) was used to generate the initial PacBio subreads assembly. The NextDenovo assembly
workflow comprises of two major steps: 1) NextCorrect: self-correction of PacBio subreads was
conducted with the parameter setting ‘seed_cutoff = 19703, minimap2_options_raw = -x ava-pb -
t 16, sort_options = -m 50g -t 16 -k 50, and correction_options = -p 32’; and 2) NextGraph: 100
round of assemblies were conducted with random parameter sets, and the assembly with the
longest contig N50 (2.40 Mb) was selected as the primary assembly for further curation and polish.
The total length of the primary assembly was 713.99 Mb, which was significantly larger than the
expected genome size (~500 Mb). This indicates the presence of redundant sequences in the
primary assembly, which is confirmed by the large proportion of BUSCO duplicated genes
(20.2%).

We undertook a redundancy filtering step for the primary assembly with a pipeline provided
by Purge Haplotigs (42). Briefly, the pipeline first identifies putative heterozygous contigs through
read-depth analysis. Contigs with a high proportion of bases within the 0.5× read-depth peak were
assigned as putative heterozygous contigs. These putative heterozygous contigs were then subject
to a sequence alignment to identify its allelic companion contig. Then the identified haplotigs were
removed from the assembly iteratively. According to the read-depth analysis, we selected the
cutoff numbers 10, 68 and 140 for the low, midpoint and high read-depths, respectively. The cutoff
for identifying a contig as a haplotig was set to 60%. This step generated a 468.48 Mb filtered
genome assembly with a contig N50 of 5.24 Mb and 2.4% duplicated genes in the BUSCO
analysis.

Illumina short reads were then used to correct residual errors in the filtered genome assembly.
Illumina reads with 10% Ns, low quality, or derived from PCR artifacts were filtered and trimmed
using the program filter_data_parallel (version 1.5) with parameters ‘-y -w 10 -B 40 -a 3 -b 2 -c 3
-d 2 -q 33’. The resultant clean reads were then mapped to contigs using Burrows-Wheeler Aligner
(BWA) mem (version 0.7.17-r1188) with default parameters. Residual errors in the contigs were
corrected with mapped NGS reads using Pilon (v.1.21) (43) with parameters ‘--fix snps, indels --
changes’.

In order to elongate the polished contigs, we assembled the CCS reads using Canu (v.2.0)
(44) with with parameters ‘genomieSize=500m, batOptions=-dg 3 -db 3 -dr 1 -ca 500 -cp 50 -M
250, correctedErrorRate=0.050, -pacbio-hifi ccs.fasta.gz’. Then, the CCS assembly and the

2
polished contig assembly were aligned to each other using nucmer (from MUMmer v.4.0.0 beta2)
(45) with parameters ‘--mum -D 5’, which was followed by delta-filter with the parameters ‘-i 89
-l 1000’ and show-coords. The polished contigs were then elongated with a home-made Perl script,
filtered again with Purge Haplotigs, and polished again with CCS reads using NextPolish (46) with
default settings. This process yielded an assembly of 477.80 Mb with a contig N50 size of 13.82
Mb (table S2).

The elongated contigs were then anchored into chromosome scale using a Hi-C proximity-
based assembly approach. In total, 62.44 Gb data were used as input for Juicer (v.1.5.6) (47) and
3D-DNA (v. 180922) (48). Illumina Hi-C reads were first aligned to the contigs using BWA-MEM
(v. 0.7.17-r1188). Contigs were ordered and oriented by the 3D-DNA pipeline with parameter ‘--
editor-repeat-coverage 3’. The resultant Hi-C contact matrix was visualized using Juicebox.
Misassemblies and misjoins were manually corrected based on neighboring interactions. Using the
“finalize” section from 3D-DNA, the manually validated assembly was used to build
pseudomolecules and then to be ordered by size. Consequently, 19 high-confidence clusters
representing the haploid chromosomes of V. sylvestris were identified, covering 95.04% of the
whole assembly (fig. S1 and table S2).

We applied a few methods to evaluate the quality of the VS-1 genome assembly. Firstly, we
performed Benchmarking universal Single-Copy Orthologs (BUSCO, v2.0) (49) analysis to assess
the completeness of the VS-1 genome assembly with the genome mode, the embryophyte_odb10
lineage and the Arabidopsis species options. Our VS-1 assembly has about 95% of 1,375 complete
BUSCO genes (table S3). This number is comparable to those of the PN40024 (12X. V2; Ensemble
release 46) Pinot Noir reference genome (96.6%) (50, 51), Chardonnay genome (94.6%) (52), and
Vitis riparia genome (95.7%) (53), which were obtained from the same pipeline. The proportion
of duplicated BUSCO genes in the VS-1 assembly is 1.2%, which is comparable to that of the
PN40024 Pinot Noir reference genome (1.4%) and V. riparia genome (2.8%), but much smaller
than that of the Chardonnay genome (11.0%). Secondly, we evaluated the assembly continuity
with LTR Assembly Index (LAI) by using LTR retriever (v.2.8.2) (54). Our VS-1 assembly has a
LAI value of 18.09 (table S3), which is higher than those of the PN40024 Pinot Noir reference
genome (10.63), Chardonnay genome (15.77), and Vitis riparia genome (12.23). This result shows
that the VS-1 genome assembly has the highest continuity among them. Thirdly, we downloaded
39 transcriptomic data (26.48 Gb) for V. sylvestris from NCBI Sequence Read Archive under the
BioProject PRJNA279229 and PRJNA244752 (table S4). Raw data files in SRA format were
converted to FASTQ format using SRA Tools Kit (version 2.9.6, http://ncbi.github.io/sra-tools/)
and then trimmed by Trimmomatic (version 0.36) (55) with default parameters. Trimmed RNA-
seq data were aligned to the VS-1 assembly by HISAT2 (v.2.1.0) (56). The result shows that the
average mapping rate is 93.21%±0.64% across all libraries, demonstrating that the VS-1 assembly
is of high quality. Fourthly, we aligned the VS-1 assembly to the PN40024 reference assembly by
nucmer (from MUMmer, v.4.0.0 beta2) (45) with parameter ‘-c 100’. The results show that, except
for a few small inversions, there is high collinearity between VS-1 and PN40024 (fig. S1). We also
found that the percentage of anchored chromosome length is much higher in VS-1 (95.04%) than
that in PN40024 (87.64%; fig. S1). This is in line with the result that many unassigned contigs in
the PN40024 assemblies correspond to the chromosome 7 of the VS-1 genome.

3
Repeat annotation: Transposable elements were identified in the VS-1 genome using a
combination of homology and de novo-based approaches (57). Tandem repeats in the genome
assembly were identified using TRF (v.4.07b) (58). Well-characterized TEs were identified and
masked by searching against the VS-1 genome assembly using RepeatMasker (v.open-4.0.9;
http://www.repeatmasker.org/) and ProteinRepeatMask (http://www.repeatmasker.org/) with the
Viridiplantae section of the database (release CONS-Dfam_3.0-rb20181026) as the query library.
To identify TEs that were absent in the library, LTRharvest (v.4.9.1) (59) and
LTR_FINDER_parallel (release 09/27/2019) (60) were used to de novo detect LTR
retrotransposons. LTR_retriever (v.2.8) (61) was then used for the accurate identification of LTR-
RTs from those two outputs, and the generation of a non-redundant LTR-RT library. Additionally,
Repeatmodeler (v.2.0) (http://www.repeatmasker.org/) was applied to construct another de novo
repeat library. RepeatMasker was run against the masked genome assembly again, with the merged
de novo repeat library as the query library. The result shows that about 57.12% of the VS-1 genome
assembly is repetitive sequence (table S6). Among them, transposable elements are the majority
of the repetitive sequence, account for 55.98% of the total assembly length.

Prediction of non-coding RNA genes: To predict non-coding RNAs, rRNA genes for plants
were mapped to the VS-1 genome assembly using BLAST (v.2.2.26) (62) with parameters ‘-p
blastn -e 1e-5 -v 10000 -b 10000’. tRNAScan-SE (v.1.3.1) (63) was used to search for tRNA genes
with default parameters. For the identification of miRNA and snRNA genes, infernal (v.0.81) (64)
was used to search the VS-1 assembly based on covariance models deposited in Rfam database
(release 9.1) (65) (table S7). In total, we predicted 327 miRNA, 570 tRNA, 183 rRNA, and 586
snRNA genes, respectively.

Protein-coding gene annotation and filtering: To annotate the protein-coding genes in VS-1,
a combination of three strategies, including de novo, homolog-based, and RNA-seq–based
predictions, were used. For Ab initio gene prediction, Augustus (v.2.5.5) (66) was applied with the
configure file having been trained by BUSCO (v.2.0) (49). Three additional ab initio gene
prediction software were used: Genescan (version 2015-10-31) (67), GlimmerHMM (v.3.0.2) (68),
and SNAP (version 2006-07-28) (69). For homology-based annotation, protein sequences of
Arabidopsis thaliana (Ensemble release 46), V. riparia (NCBI assembly GCA_004353265.1), V.
vinifera cv. Chardonnay(52), and three versions of annotation for PN40024 12x.v2 (NCBI
assembly GCA_000003745.2, Ensemble release 46 and VCost.v3(51)) were downloaded.
Homologous sequences were aligned against the VS-1 genome assembly using TBLASTN
(v.2.2.26) (70) with parameters ‘-e 1e-5, -F F’. Genewise (v.2.2.0) (71) was used to predict gene
models based on the aligned sequences. For transcript-based annotation, quality-trimmed RNA-
seq reads were mapped to the unmasked VS-1 genome using HISAT2 (v.2.1.0) (56) with default
parameters, and StringTie (v.1.3.3b) (72) with parameter ‘-merge’ was used to combine the output
libraries to a representative set of non-redundant transcripts. Based on the abovementioned three
annotation results, a weighted and non-redundant gene set was generated by merging all of the
gene models with EvidenceModeler (v.r2012-06-25) (73). Finally, the trimmed RNA-seq data
were assembled into unigenes using Trinity (version 2.9.1) (74) with default parameters. The result
was then fed into PASA pipeline (v.2.4.1) (73) together with the EVM result for gene structure
refinement and alternative spliced isoform annotation.

4
To obtain reliable protein-coding gene models, we also filtered the gene set according to the
following five criteria: 1) remove a gene if more than half of its gene region was annotated as
repeat; 2) remove genes without a start or stop codon; 3) remove genes with any in-frame stop
codons; 4) remove a gene if its CDS length was shorter than 300 bp; 5) remove a gene if its CDS
length was not a multiple of three. In the end, the final reference gene set contains 34,527 protein-
coding genes with a mean transcript size of 5,275 bp, a mean coding sequence size of 1,164 bp,
and a mean number of exons per gene of 4.91 (table S8). Completeness of the annotated gene set
was evaluated by BUSCO (version 2.0) (49) with the plant-specific dataset embryophyte_odb10
(table S8). The result shows that about 95.0% of BUSCO genes are complete and the proportion
of duplicated BUSCO genes is 2.1%. These statistics are similar to those of PN40024 (96.8% and
1.5%, respectively), and better than those of Chardonnay (87.6% and 45.0%, respectively) and V.
riparia (93.0% and 31.5%, respectively) genomes.

Functional annotation: The predicted genes were further aligned to the SwissProt (75),
TrEMBL, and KEGG (76) databases by BLASTP (v.2.2.26) (77) with an E value of 1e−5, and the
most significant hits were retained. InterProScan (v.5.17-56.0) (78) was used to detect protein
motifs and domains in predicted genes against multiple database, including Coils (v.2.2.1) (79),
Gene3D (version 3.5.0) (80), Hamap (201511.02) (81), Panther (version 10.0) (82), Pfam (v.28.0)
(83), PIRSF (v.3.01) (84), PRINTS (v.42.0) (85), ProDom (2006.1) (86), ProSite (v.20.119) (87),
SMART (v.6.2) (88), SUPERFAMILY (v.1.75) (89), and TIGRFAM (v.15.0) (90). In summary,
we were able to assign functional annotation to 33,839 protein-coding genes in the VS-1 assembly,
accounting for 94.15% of total predicted protein-coding genes (table S9).

Sample collection and processing. A total of 23 institutions from 16 nations in the world
contributed to the global grapevine cohort (18, 91–96), which comprised of 2,269 V. vinifera and
1,035 V. sylvestris accessions. The V. vinifera accessions were collected from institutional
germplasms and private collections. The selection was designed to preferentially include old,
autochthonous, and economically important varieties to maximize the spectrum of genetic
diversity. The V. sylvestris accessions were collected from all major refugia in the world, which
spans a large geographical area from Levant and Transcaucasia in the east to the Iberian Peninsula
in the west (97). Total genomic DNA was either obtained from dried grapevine leaf tissues using
the CTAB Plant DNA Extraction Kit (Genenode Biotech Co, Beijing) in a wet lab at the Yunnan
Agricultural University, or directly sent from collaborators. For the latter, genomic DNA was
cleaned once by sodium acetate precipitation and reconstituted in nuclease-free water (Ambion,
Texas, USA). Sequencing libraries with an insert size of 350~550 bp were prepared with
NEBNext® Ultra™ DNA Library Prep Kit (Illumina, USA) according to the manufacturer’s
directions. Paired-end sequencing was performed on an Illumina NovaSeq 6000 platform by both
Novogene (Beijing, China) and Berry Genomics (Beijing, China). The target sequencing depth
was 20´ for each accession. After excluding unusable sequencing libraries, we curated raw
genome data for 3,270 samples (2,256 V. vinifera and 1,014 V. sylvestris; success rate 99.4%),
totaling 33.96 Tb. On top of these, we also included 271 V. vinifera accessions and 73 V. sylvestris
accessions from previous publications in the following steps (7, 8, 17).

Variant calling, validation, and annotation. The raw sequencing reads were filtered with fastp
(v.0.20.0) (98). We removed reads if more than 40% of the bases have a Phred quality lower than
20. The clean paired-end reads were then mapped back to the VS-1 genome with BWA-MEM2

5
(v.2.0 prel; https://github.com/bwa-mem2/bwa-mem2) using default parameters. We used
Samtools (v.1.9) (99) and Picard (v.2.21.6-0; https://broadinstitute.github.io/picard) to sort the
aligned reads and remove duplicated reads. The sequencing depth, duplication rate, and percentage
of mapping of each accession was calculated with bamdst (v.1.0.9;
https://github.com/shiquan/bamdst) (table S12 and S13). We denoted any value that was outside
mean ± 3S.D. of these parameters to be an outlier, and excluded grapevine samples with outlier
parameters from variant calling. With this method, we retained 2,237 V. vinifera and 949 V.
sylvestris samples from our collaboration and 266 vinifera and 73 sylvestris samples from previous
publication, making the final grapevine cohort of 3,525 accessions. A single accession of
muscadine grape (ZZ-01) was included as outgroup for the downstream analyses (100).

We used the chromosomes of the VS-1 genome (excluding unanchored sequences) as


references in the identification of variants (both SNP and Indel). The variant detection was carried
out with GATK3 (v.3.8; https://github.com/broadinstitute/gatk) according to the recommended
workflow (101). In brief, the variants of each accession were called using the GATK
HaplotypeCaller, and then a joint-genotyping analysis of the gVCFs was performed on all samples
(also separately for V. vinifera and V. sylvestris samples). In the filtering step, various parameters
used in the hard filtering of raw SNPs and Indels were determined according to the
recommendation of GATK (101). As a result, the SNP filter expression was set as “QD<2.0,
QUAL<30.0, SOR>3.0, FS>60.0, MQ<40.0, MQRankSum<-10.0, ReadPosRankSum<-8.0”. The
short Indel filter expression was set as “QD<2.0, QUAL<30.0, SOR>5.0, FS>100.0,
InbreedingCoeff<-0.8”. After the initial filtering step, the number of SNPs and short Indels became
56,462,680 (including 45,624,306 bi-allelic SNPs; Ti/Tv=2.24) and 11,069,435 (including
7,314,397 bi-allelic Indels), respectively. Further filtering yielded a basic set of 19,215,781 SNPs
(Ti/Tv=2.80) and 1,836,885 Indels that are bi-allelic and with less than 60% missing calls and
MAF>0.005. For many downstream analyses, the core set of 10,086,416 SNPs (Ti/Tv=2.87) and
827,214 Indels were acquired by setting the MAF cut-off at 0.05. The intergenic region of the
genome encompasses about 64.7% of SNPs and 70.0% of Indels. About 7.0% of SNPs are located
in the coding sequence, and the nonsynonymous to synonymous SNP ratio is 1.497. In comparison,
only 2.9% of Indels are found in the coding sequence. We also show that 423,625 SNPs are
predicted to be deleterious, and 151,721 Indels to cause frameshift mutations in the coding
sequence. We also calculated the ratios of transition to transversion (Ti/Tv) SNPs with the vcftools
(v.0.1.16) package (102). Notably, the ratios of Ti/Tv increased as the raw SNPs were filtered,
ranging from 2.12 to 2.87, which showcases the high quality of the SNP call sets. We also
performed variant calling separately on all V. vinifera and V. sylvestris samples (table S16). The
number of identified variants were not significantly different between the two. SNP density, Indel
density and total genetic diversity across each chromosome were calculated with 100 kb sliding
window using vcftools (V.0.1.16) (102).

We validated our 3K grapevine SNP datasets with the 10,207 SNPs on a widely used 10K
grapevine SNP chip (103). Initial inspection found that the 10K grapevine SNP chip contains one
replicate of SNP1021_163, leaving the total number of unique alleles as 10,206. Since these SNPs
are based on the PN40024 V. vinifera reference genome, we found the corresponding SNP
locations in the VS-1 genome before the validation. By using a homemade Perl script, we extracted
a short 120 bp DNA sequence at the location of each SNP from the PN40024 genome so that there
is a 60 bp DNA tag on either side of the SNP. The sequences are compiled into a fasta file. We

6
used MegaBLAST (104) in the BLAST (v.2.231+) suite of functions to map the sequences onto
the VS-1 genome with the command “blastn -task megablast -use_index true -db VS1.final.fa -
query snp.fa -outfmt 6 -out”. This resulted in 9,797 unique mapping loci in the VS-1 genome and
9,384 unique loci in the chromosomes (table S17). Among these 9,384 unique loci, we were able
to recover 9,134 SNPs (97.34%) in our raw SNP dataset and 9,098 SNPs (96.95%) in our filtered
SNP dataset, respectively.

We previously reported the SNP information for 49 Vitis species based on the PN40024 V.
vinifera reference genome (8). This dataset was also used for the validation process. We extracted
the SNPs for the V. vinifera and V. sylvestris accessions from the 472 Vitis SNP dataset, obtaining
21,149,067 (MAF>0.005) and 11,839,025 (MAF>0.05) SNPs, respectively (table S18). Using the
same method described above, we mapped the SNP tags onto the VS-1 genome with default
parameters. The result showed that there are 13,071,874 (MAF>0.005) and 7,352,118 (MAF>0.05)
unique mapping loci in the chromosomes. Among them, we were able to recover 11,798,615 SNPs
(MAF>0.005; 90.26%) and 6,783,742 SNPs (MAF>0.05; 92.27%) in our raw SNP dataset and
10,761,520 SNPs (MAF>0.005; 82.33%) and 6,398,063 SNPs (MAF>0.05; 87.02%) in our filtered
SNP dataset, respectively.

Our grapevine cohort contains 59 Chasselas clones, which provide a rare opportunity to
identify somatic SNPs and test if these somatic SNPs could be recovered in our SNP datasets. We
designated sample 229 as control, and utilized Mutect2 (v.4.1.1.0) (105) with default parameters
to identify somatic mutations. Even though Mutect was developed to process genomic data from
tumor tissues, it was also used in the identification of somatic mutations in oak trees (106). The
result was filtered with the command “FilterMutectCalls” and the following criteria were set: (1)
a minimum sequencing depth of 15´ at the mutant loci for the control and testing libraries; (2) no
mutant allele in the control library; (3) each somatic mutant allele was supported by 6 or more
individual reads. We identified on average 109 (range 12-248) high-quality somatic SNPs for each
of the 58 Chasselas clones (table S19). We found that on average 93.5% ± 2.8% of the somatic
SNPs could be recovered from our raw SNP dataset, and 80.7% ± 5.0% of the somatic SNPs could
be recovered from our filtered SNP dataset.

We performed SNP and Indel annotation according to the VS-1 genome using the package
ANNOVAR (v.2015-12-14) (107), and predicted the effect of nonsynonymous SNPs on the
biological function of proteins with Provean (v.1.1.5) (108).

Genetic clonal accessions. To distinguish from the concept of ‘clone’ used in viticulture, we
define genetic clones as accessions sharing genetic profiles with each other. This includes cuttings,
synonyms, and mutants. The removal of genetic clones and homonyms is crucial for the proper
analyses of grapevine population structure and history. We utilized identity-by-state (IBS) sharing
pattern estimators (109–111) to infer relationship among accessions. This approach is superior to
the identity-by-descent (IBD) inference in our case in that: (1) it does not require prior knowledge
of ancestral pedigree or allele frequencies, and (2) it is robust to SNP ascertainment errors (109–
111). We removed SNPs with low read support (<7 reads) or with high linkage disequilibrium
(LD, r2³0.5) with other SNPs for the analyses. The estimators were calculated with SNPduo
(V.2.00a)(109). By using estimator values from known clonal accession pairs as reference, we set
the following three cut-off values: R1³1.20, IBS2*ratio³0.99, and KING-robust kinship³0.3426.

7
We would assume a genetic clonal relationship if two of the above thresholds were met between
two accessions. We kept one accession for each distinctive genotype and marked all other clonal
accessions for exclusion from analyses.

Phylogenetic tree and network. The SNPs were processed using SNPhylo (Version 20180901)
(112) with default parameters. The resultant phylip format data were taken to construct a ML
phylogenetic tree using RAxML-NG (v.0.9.0) (113) with 32 random search trees and 100 TBE
bootstraps (114). The best tree was chosen according to the maximum Final LogLikelihood value.
A muscadine grape was included as outgroup.

For reticulate phylogenetic network construction, the SNPs with >20% missing calls and MAF <
0.05 were removed, and then PLINK (v1.90b3.38) (115) was used to remove SNPs having high
LD (r2 >= 0.1) within a continuous window of 50 SNPs (step size 1 SNPs). After converting the
SNPs to a nexus format, a phylogenetic network was constructed using SplitsTree4 (v.4.18.3)
(116).

Principal Component Analysis and ADMIXTURE. We chose the core set of SNPs (MAF
greater than 0.05) for additional pruning. PLINK (v1.90b6.12) (115) was used to remove SNPs
having high LD (r2³0.5) within a continuous window of 50 SNPs (step size 5 SNPs), which yielded
2,669,247 SNPs for both analyses. We performed PCA with GCTA (v.1.26.0) (117) using the
default settings. The first three principal components were plotted and colored according to major
viticultural region, utilization, and genetic groups, respectively.

We also examined the genetic ancestry with ADMIXTURE (v.1.3.0) (118) and determined the
choice of K using a 5-fold cross-validation (CV) procedure (119). Even though the CV error
gradually decreased from K=2 to 12, we decided to take K=8 as the optimal value. This is based
on two observations: (1) From K=8 on, the CV error decreases at a slower rate and each additional
K only reduces the CV error by 0.0015 or less; (2) At K=8, the corresponding ancestries are
sufficient to categorize both V. sylvestris and V. vinifera into distinct groups (fig. S6), reflecting
the lowest model complexity. Finally, the grouping and sorting of individuals with similar
ancestral proportions at K=8 was achieved through hierarchical clustering, so that the final
ADMIXTURE graph is easier to read.

Archetypal analysis. We chose the core set of SNPs (MAF greater than 0.05) for additional
pruning. PLINK (v1.90b6.12) (115) was used to remove SNPs having high LD (r2³0.5) within a
continuous window of 50 SNPs (step size 5 SNPs), which yielded 2,669,247 SNPs for archetypal
analyses. Archetypal analysis was performed using archetypal-analysis (120) with parameters “--
tolerance 0.0001 --max_iter 400”.

Grapevine major group characterization. Linkage disequilibrium (pairwise r2 values) was


calculated across all chromosomes using PopLDdecay (v.3.41) (121) with default parameters. The
average nucleotide diversity (π) within continuous 100 kb sliding windows, pairwise population
fixation index (FST), and individual heterozygosity were calculated with VCFtools (v.0.1.16)
(102).

8
Isolation-by-distance analysis. The pairwise population fixation index (FST) among all
viticultural countries/regions (minimum three individuals required) were calculated with VCFtools
(v.0.1.16) (102). The centroid latitudes and longitudes of countries/regions were used to calculate
the haversine distances with ‘distHaversine’ in R package. Scatterplot of FST and haversine
distances was used to obtain linear regressions between the two variables. Mantel test was used to
compare the similarity of FST and haversine distance matrices.

Ecological niche modelling. We compiled 41 and 16 different geographical records from all
identified Syl-W and Syl-E accessions, respectively for the analysis. The raster files of 19
bioclimatic variables at 2.5 minutes resolution for the Last Glacial Maximum (LGM, ca. 21 ka,
v1.2b) and early Holocene (EH, Greenlandian, 11.7-8.326 ka, v1.0) paleoclimate data were
obtained from PaleoClim (122). Since removing highly collinear variables has an insignificant
impact on maximum entropy model performance (123), we included all original variables in the
analysis. The R package ENMeval (v.0.3.1) (124) was used to test all combinations of defined
settings and perform cross validation for model evaluation. For the Syl-W ecotype, the settings of
LQ_1, LQH_2.5 were chosen to measure variable importance for the LGM and EH, respectively,
whereas for the Syl-E ecotype, the settings of LQ_1.5 and LQ_4 were selected. Then the
projections for habitat suitability were generated in MaxEnt (v.3.4.4) (125) from the ENMeval
results with the parameters of 10 subsample replicated runs and 30 random test percentage.

Demographic history inference. First, we employed the MSMC2 (126) to infer population size
and split time. The input files for MSMC2 were generated with MSMC Tools
(https://github.com/stschiff/msmc-tools). In brief, bi-allelic SNP sites with uniquely mapped reads
and 0.5 to 2-fold mean coverage depths were used in the analyses, and the remaining genomic
regions were masked using the script bamCaller.py. Then all segregating sites within each group
were phased using SHAPEIT (v.2.r904) (127). Single population demographic inference was
performed on four individuals (eight haplotypes), whereas population split inference was
performed on two individuals (four haplotypes) for each group. Only grapevine accessions with
the highest proportion of major ancestries (top 50 or major ancestry > 70%) were randomly chosen
for the inference. Single population demographic inference was repeated ten times for each group.
Median population split times were deduced from the results of 100 random combinations for each
comparison. We used a mutation rate of 5.4×10-9 per site per generation and a generation time of
3 years for demographic history inference (8).

The stairway plot 2 (v.2.1) (128) was also used for estimating the population demography
history for V. sylvestris from SNP frequency spectrum. We filtered out SNP sites in the coding
sequence region, masked genomic regions of repetitive elements, and applied mask so that short
read sequencing reads can be uniquely mapped to chromosomal regions (128). For each
population, we only included accessions with the highest proportion of major ancestries (50 for
Syl-W1, 58 for Syl-W2, 51 for Syl-E1, and 34 for Syl-E2). We estimated folded SFS using
easySFS (https://github.com/isaacovercast/easySFS). Population history was predicted by
ignoring singletons and 200 bootstraps were run to assess confidence intervals. We plotted the
change of estimated median effective population size through time and the associated 95%
confidence intervals (2.5% and 97.5% percentiles).

9
We used Momi2 (v.2.1.19) (129) to explore demographic models for various sets of four
populations. Five individuals with the highest proportion of major ancestries were included in each
population. We filtered out SNP sites in the coding sequence and genomic regions of repetitive
elements. The extracted folded site frequency spectrum (SFS) was split into 100 equal-sized blocks
for jackknifing and bootstrapping. One gene flow event and constant population size were assumed
for a set of four-population comparison. The split times of Syl-W/Syl-E and Syl-E1/CG1 were
based on the MSMC2 results, where the interquartile range (25% to 75%) was fed into Momi2.
We fitted 20 independent runs with random starting parameters and selected the demographic
model with the biggest log-likelihood value of all runs. Then 100 bootstraps for the best model
were implemented by resampling blocks of the SFS to generate confidence intervals.

Selective sweep signals. We investigated the selection signals across the whole genome via a cross
comparison of the genetic differentiation (FST) and nucleotide diversity (π). A 50 kb sliding
window with 10 kb step approach was applied to quantify FST and π by using the VCFtools
software (v0.1.16) (102). The candidates that meet both top 5% of the two values were selected as
selective signals.

Treemix. We estimated admixture graphs of grapevine groups using TreeMix (v.1.12), which
applies a ML method based on a Gaussian model of allele frequency change (130). For each group,
individuals with at least 75% major ancestries (also average Syl-W ancestry in each V. vinifera
group <3%) were used. SNPs were filtered for missing calls and monomorphism. The topology of
the ML trees changes depending on the number of migration edges (m) allowed in the model. The
optimal number of migration edges was determined from the range of one to ten using a R package
OptM (v.0.1.6) (131). The TreeMix program was run with “-bootstrap 1000 -k 500”. The Syl-E1
group was set as root. For each migration event, we constructed the tree with migration edges 10
times using random seed. The best outcome was determined by the biggest residual value.

f-statistics, Patterson’s D, and local introgression region. Individuals with at least 75% major
ancestries were used for each group. Outgroup f3 statistics were calculated using a R package
admixr (v.0.9.1) (132) for all possible combinations of grapevine groups with Vitis rotundifolia as
the outgroup. The Patterson’s D and f4 admixture ratio for all possible combinations of trios of the
grapevine groups were calculated using Dtrios in Dsuite (v. 0.4 r42) (133) with V. rotundifolia as
the outgroup. It is worth noting that Dsuite does not assume prior knowledge of the tree, and orders
the test trio in a way so that the BBAA pattern is more common than ABBA and BABA patterns.
Dsuite also orders the position of P1 and P2 so that the resultant D statistic is always positive.
SNPs were filtered for missing calls and monomorphism. To further locate the local introgressed
genomic regions, the df and fdM statistics were calculated along the whole genome using
Dinvestigate in Dsuite with a sliding window of 50 SNPs and a step of 5 SNPs. We defined the
putative introgressed regions as those among top 1% of both values and visualized these regions
with R.

Sex determination region haplotypes. Positions of SDR-related SNPs were obtained from a
previous publication (33) and the corresponding SNPs in the SV-1 genome were obtained. The
genotypes of SDR were processed generate haplotypes and transformed to NEXUS formats by
DnaSP (v5) (134). Geographical and group categorization information was associated with

10
haplotypes in the NEXUS file as a trait block. Popart (v.1.7) (135)was used to construct haplotype
networks using the median-joining method.

Genome-wide association study. We performed a genome-wide association (GWA) study on


muscat and non-muscat grapevines using fastGWA-GLMM method (136) in GCTA (v.1.93.3beta)
(117). For the binary categorization, the muscat phenotype (n=134, table S1 and S14) was defined
as 1 and non-muscat phenotype (n=158) as 0. The non-muscat grapevine were selected from CG1,
the earliest domesticates. SNPs with missing calls greater than 0.2 and minor allele frequency less
than 0.01 were filtered. We defined the whole-genome significance cut-off with -log10 (P) = 6.

We also performed a GWA analysis on berry skin color using the MLMA-LOCO model in
GCTA (v.1.93.3beta) (117) to control the impact of population stratification. The phenotype for
all cultivated grapevines was obtained from VIVC, and assigned categorical values as follows:
green-yellow=1; rose=2; red=3; red-black=4. SNPs with missing calls greater than 0.3 and minor
allele frequency less than 0.01 were filtered. We defined the whole-genome significance cut-off
with -log10 (P) = 6.

Supplementary Text
Parent-offspring relationships. We also collected known parent-offspring relationships from the
Vitis International Variety Catalogue (VIVC; www.vivc.de) and used their IBS sharing pattern
estimators to determine cut-off values for first-degree relationship candidates. A total of 10,181
accession pairs met all four estimator criteria, R0£0.096, R1Î[0.5, 1.20), KING-robust
kinshipÎ[0.210, 0.3426), and IBS2*ratioÎ[0.912, 0.99). We then manually screened all candidates
to identify 194 close-cross relationships (e.g., backcross), 1,356 parent-offspring relationships, and
214 full sibling relationships (fig. S3 and table S23).

Large viticultural regions. Major viticultural countries in the world are usually categorized into
larger regional groups for both clarity and convenience. We based our categorization on a previous
report (19) but made minor modifications (fig. S3). Namely, the changes are: (1) China, Japan and
South Korea form an independent Eastern Asia regional group; (2) Iran is grouped in the Caucasus,
in that the individuals are closer to Armenian and Azerbaijan samples on the PCA plots; (3) Turkey
is not grouped with any other close-by regions. This is to better showcase the Turkish V. sylvestris
samples on the PCA plots, as they do not form a close cluster with either Caucasian or Balkan
individuals. We also agree with the previous report on listing Italy as its own group. The number
of accessions is above 50 for the majority of viticultural regions in the Eurasian continent. Maghreb
and Central Asian V. sylvestris accessions are not readily available to the field, since climate
change and social instability in the region have prevented field investigation in the past decade.
For V. vinifera, the number of table, wine, and dual-purpose grapevines accounts for 88% of the
core cultivated accessions.

Description of the V. sylvestris accessions. Four V. sylvestris groups with distinctive ancestries
are found. In the east, the Syl-E1 accessions (K2 red: 84.2%±6.6%) are limited to the banks of the
Jordan River and the Sea of Galilee in northern Levantine, whereas the Syl-E2 accessions (K6
navy blue: 72.7%±8.9%) are mainly found in South Caucasus and the southern bank of the Caspian
Sea. In the west, the Syl-W1 accessions (K1 sky blue: 94.6%±7.4%) are mainly located close to

11
the Danube River and the upper Rhein River, whereas the Syl-W2 accessions (K8 pink:
69.7%±10.8%) grow in the Iberian Peninsula and southwest France (fig. S8).

The admixed V. sylvestris accessions form several discrete clusters according to the
hierarchical clustering topology (fig. S8).

Syl-Admix1: The Syl-Admix1 accessions are predominantly comprised of K1


(57.5%±7.4%; sky blue) and K8 (29.6%±7.4%; pink) genetic ancestries. Their geographic
locations include Eastern France (districts Corsica, Nièvre, Bas-Rhin, and Alpes-Maritimes),
Switzerland, Italy (northern Italy, Sardinia), and Western Balkan (Croatia, Bosnia and
Herzegovina). This area is the middle zone in between Syl-W1 (K1) in the north (Germany,
Austria, and Hungary) and Syl-W2 (K8) in the south (Iberia).

Syl-Admix2: The Syl-Admix2 accessions are mainly from Northern Black Sea (Crimea)
and Eastern Balkan (Bulgaria). Besides K1 being the predominant ancestry component
(57.6%±9.3%; sky blue), these accessions contain a sizeable portion of K6 ancestry
(19.6%±6.6%). The third largest ancestry component is K8 (14.6%±4.2%; pink). The proportion
of K6 suggests a genetic influence from the Caucasian eastern ecotype Syl-E2.

Syl-Admix3: The Syl-Admix3 accessions are mainly from Eastern Turkey and Italy. These
accessions are characterized by highly admixed ancestries, which suggests intensive introgression
from V. vinifera into V. sylvestris.

Syl-Admix4: The Syl-Admix4 accessions come from the same region as Syl-Admix1. The
predominant ancestry components are also K1 (39.2%±6.1%; sky blue) and K8 (32.6%±7.9%;
pink). But Syl-Admix4 contains higher proportions of K2 - K7, which suggests intensive
introgression from V. vinifera into V. sylvestris.

Syl-Admix5: The Syl-Admix5 accessions in the Iberian region are characterized by the K7
(22.5%±9.8%; yellow) and K8 (58.5%±13.5%; pink) ancestries. Since K7 is associated with
Iberian cultivated grapevines, Syl-Admix5 represents local introgressed hybrids in the Iberian
Peninsula.

Syl-Admix6: The Syl-Admix6 accessions were collected from Turkey, Iran, and Armenia.
They are predominantly shown by the K2 (25.4%±12.8%; red) and K6 (36.5%±14.6%; navy blue)
ancestry components and represent the admixture between the two eastern ecotype groups.
However, these accessions also contain sizeable proportions of other ancestries representing
cultivated grapes.

Description of the V. vinifera accessions. The ADMIXTURE analysis revealed that very few V.
vinifera accessions contain 100% of a specific genetic ancestry (fig. S8, table S25). This reflects
the intensive hybridization history among V. vinifera accessions. We plotted the cultivated
grapevines on a tri-plot according to the proportions of K2, K5, and the sum of all other K
components (fig. S8). The result shows that K2 and K5 are associated with table grapevines,
whereas all other K components are associated with wine grapevines.

12
CG1: The CG1 cultivars can trace their birthplaces to a large geographical area, which covers
East Asia, Central Asia, Western Asia, Caucasus, Northern Black Sea, and Northern Africa. The
genetic ancestry of the CG1 group is very similar to that of the Syl-E1 group, where the
characteristic K2 (red) component on average accounts for 73.9%±10.3% of the total ancestry.
‘Asswad Karech’ (Fre61), ‘Amud’ (IS39), and ‘Safsufa R’ (IS52) are Western Asia cultivars with
the highest K2 component. Given that K2 is associated with table grapevines, we reason that the
CG1 cultivars represent Western Asia table grapevines.

CG2: The CG2 cultivars bear resemblance to the Syl-E2 individuals in genetic ancestry,
where K6 (navy blue) is the predominant component (66.4%±17.1%). They are mainly located in
the Caucasus and Northern Black Sea region. The Georgian grapevine ‘Kisi’ (GE29) is the only
cultivar in this study having a pure K6 ancestry. Other grapevines with a high K6 ancestry
component include ‘Kurkena’ (GE25), ‘Ghvinis Tsiteli’ (GE13), and ‘Khikhvi’ (GE23). We
reason that the CG2 cultivars represent Caucasus wine grapevines.

CG3: A key feature of this group is the large number of muscat grapevines and their
descendants for table or dual-purpose usage. In particular, ‘Muscat Hamburg’ and ‘Königin der
Weingärten’ are the most popular parental varieties with a pure K5 (purple) ancestry. At the group
level, the K5 component accounts for 87.6%± 11.2% of the total ancestry. The geographical
distribution of CG3 cultivars is quite diffused, spanning from Eastern Asia to Western Europe.
Even though intercross among muscat grapevines is common, it should be noted that not all
descendants inherit the muscat aroma. With this said, we reason that CG3 cultivars represent
muscat grapevines.

CG4: The CG4 cultivars are mainly distributed in the Balkan and characterized by the major
K4 (orange) ancestry component (69.9%±17.4%). ‘Crimposie’, ‘Furmint’, ‘Fekete Balafant’,
‘Plavaie’, and ‘Armas’ are the cultivars with a pure or close to pure K4 ancestry. We define the
CG4 cultivars as Balkan wine grapevines.

CG5: The CG5 group represent Iberian grapevines that contain a major K7 (yellow) ancestry
component (68.8%±12.8%). Cultivars with a pure K7 ancestry include ‘Cayetana Blanca’,
‘Heben’, ‘Boal Vencedor’, and ‘Zalema’. We define the CG5 cultivars as Iberian wine grapevines.

CG6: The CG6 group is mainly associated with a K3 (dark brown) ancestry component
(68.4%±12.2%) and a major distribution area in France and Germany. Cultivars with a pure K3
ancestry include ‘Gros Noir A Tacher’, ‘Savagnin Blanc’, ‘Pinot Noir’, and ‘Bequignol Blanc’.
We define the CG6 cultivars as Western European wine grapevines.

The hierarchical clustering result reveals three major groups of admixed V. vinifera
accessions (C-Admix1-3; fig. S8). The C-Admix1 group represents the diverse breeding outcome
between muscat grapevines (K5: purple, 48.0%±11.1%) and other groups (K2, K3, K4, and K7).
The C-Admix2 group represents the breeding descendants between Western Asia table grapevines
(K2: red, 39.7%±9.6%) and other groups (K4, K5, K6, and K7). The C-Admix3 group contains
accessions of assorted genetic ancestry combinations, including Balkan wine/Iberian wine
(CG4/CG5) grape crosses, Balkan wine/ Western European wine (CG4/CG6) grape crosses,
Iberian wine/Western European wine (CG5/CG6) grape crosses, and accessions with more than

13
four genetic components. Of note, there are very few cultivars descending from a cross between
Caucasian wine grapevines and other groups.

The V. vinifera accessions with sizeable V. sylvestris ancestries. As six cultivated grapevine
groups and their corresponding genetic ancestries are defined above, we are able to identify
cultivars having a sizeable wild western ecotype ancestry at K=8 (table S25 and fig. S9).
Representative cultivars include ‘Riesling Blau’, ‘Manseng’ cultivars, and ‘Lambrusco’ cultivars,
all of which were shown in previous studies to be V. vinifera and V. sylvestris hybrids (11, 93,
137). The ancestry composition reveals details of the hybridization. On the one hand, the ratio of
V. vinifera and V. sylvestris ancestries approximately equals to 1:1, suggesting the establishment
of some cultivars with a single cross between cultivated and wild accessions. On the other hand,
the proportion of K1 and K8 (sky blue and pink) may inform the type of V. sylvestris used in the
cross and possibly the large region where the cross occurred. For instance, ‘Riesling Blau’ has a
higher proportion of K1 (40.3%). This suggests that the parental V. sylvestris belongs to Syl-W1
and that the proposed place of origin is where Syl-W1 could be found (e.g., Germany). In
comparison, ‘Petit Manseng’ has a higher proportion of K8 (48.1%), which indicates the cross with
Syl-W2 in Western Europe. The proportions of K1 and K8 in the Italian Lambrusco cultivars are
of similar size (about 25% each). This suggests a parental V. sylvestris of the admixed nature.

Description of shared and unique domestication signatures. In the main text, we listed gene
examples that are associated with shared and unique domestication signatures (full list in Table
S28 and S29). The descriptions of these genes and their functions are detailed below. As gene
functional annotation depends on homology-based inference, it should be noted that these
grapevine genes require additional verification for their true functions in the plant.

Vvsyl02G000297 (NPF): The protein product of this gene is predicted to be a NRT1/PTR


protein 4.5, which belongs to the large NITRATE TRANSPORTER 1/PEPTIDE
TRANSPORTER family. These membrane proteins are known to transport a variety of molecules
(i.e., nitrate, peptides) in plants, thereby playing important roles in plant development and growth
(138).

Vvsyl08G001229 (FER4): This gene product is predicted to be the iron storage protein
ferritin-4. Along with other ferritins, they not only help maintain the iron homeostasis but also
play a role in the oxidative response in plants (139).

Vvsyl09G001081 to Vvsyl09G001083, (GA2OX): The genes, also with Vvsyl09G001084


and Vvsyl09G001086 (both GA3OX), form a gibberellin oxidase gene cluster on chromosomal 9.
They are involved in the biosynthesis and degradation of the plant hormone gibberellin. It has
been shown that grapevine gibberellin oxidases play an important role in the regulation of
flowering and fruit-set (140).

Vvsyl17G000504 to Vvsyl17G000506 (PPR): The pentatricopeptide repeat-containing


proteins are mainly located to plant organelles. They play important roles in regulating plant
physiology, development and biotic/abiotic responses (141, 142).

14
Vvsyl17G000525, Vvsyl17G000526, and Vvsyl17G000528 (RNF181): This E3 ubiquitin
ligase gene cluster is on chromosome 17. The proteins of this gene family are involved in the
ubiquitination-mediated protein degradation pathway, thereby regulating a myriad of plant
biological processes, including growth, reproduction, and biotic/abiotic response (143).

Vvsyl05G001656 to Vvsyl05G001658 (MecgoR): This gene cluster encodes proteins with


prediction functions of methylecgonone reductase, which are essential in the biosynthesis of
tropane alkaloids in plant (144).

Vvsyl07G001732 and Vvsyl07G001733 (TR2): This gene cluster encodes tropinone


reductase homologs in grapevine. This family of enzymes controls a branch point in the tropane
alkaloid biosynthetic pathway in plant (144).

Vvsyl17G000431 to Vvsyl17G000435 (SSL): This gene cluster encodes strictosidine


synthase-like proteins. They catalyze the biosynthesis of strictosidine, a precursor molecule to
more than 2000 indole alkaloid compounds in plant (145).

Vvsyl05G001664 (SWEET17): This gene belongs to the bidirectional sugar transporter gene
(SWEET; sugars will eventually be exported transporter) family in plants. Alongside its function
in carbohydrate transport and homeostasis, the SWEET proteins have also been implicated in the
biotic/abiotic response in grapevine (146, 147).

Vvsyl06G000699 (PFKFB1): The 6-phosphofructo-2-kinase/fructose-2,6-bisphosphatase is


a key enzyme in the glycolysis signaling pathway in all cells.

Vvsyl15G001098 and Vvsyl15G001099 (BEAT): The product of these two genes (acetyl-
CoA:benzylalcohol acetyltransferase) are able to catalyze the biosynthesis of benzylacetate. This
enzyme has been implicated in the production of floral scent in Clarkia breweri and Prunus mume
(148, 149).

Vvsyl03G001186 (UFGT): The function of this gene is predicted to be anthocyanidin 3-O-


glucosyltransferase. This important enzyme is involved in the anthocyanin biosynthesis and
grapevine berry color improvement (150).

Vvsyl05G001489 and Vvsyl05G001490 (UPL6): These two genes are members of the E3
ubiquitin ligase gene family. The proteins of this gene family are involved in the ubiquitination-
mediated protein degradation pathway, thereby regulating a myriad of plant biological processes,
including growth, reproduction, and biotic/abiotic response (143).

Vvsyl17G000415 to Vvsyl17G000417 (WAK): This gene cluster encodes the putative wall-
associated receptor kinase like proteins. These transmembrane receptors bind to pectin due to
pathogen or wound, and initiate defense mechanisms in plant cells (151).

The domestication time gap between genomic inference and archaeological data. We briefly
present the available archaeological data for grapevine and discuss the possible reasons that may
underlie the domestication time gap between genomic inference and archaeological data.

15
The domestication of fruit trees is an indispensable component in human’s transition to sedentism
in the Neolithic. In West Asia, archaeological evidence dated the domestication of perennial fruit
crops between 8500 to 5500 ya (152). So far, the great majority of grapevine remains in
archaeological excavations were seeds. The categorization of these archaeological finds is based
upon the observation that V. sylvestris and V. vinifera differ in seed morphology (2). As shown in
table S35, the first appearance of domestic-type grapevine seeds in the Western Asia was during
the Early Bronze Age, compared to the wild-type of previous periods. In the Caucasus region,
grapevine pips were found in the Late Chalcolithic period site Areni-1 from Armenia, dating back
to ~8,000 BP. Therefore, the consensus in the archaeobotanical world states that the domestication
of perennial fruit tree (i.e., grapevines) lagged behind in time compared to the domestication of
annual grains (152). One theory speculates that fruit tree cultivation is a labor-intensive act (153).
The long-invested time means a delayed return for early humans, thereby suggesting that the first
settled agriculturalists were unlikely to domesticate fruit tree from the very start (153).

The genomic inference dates the domestication of grapevines to the Early Neolithic at ~11,000 ya,
similar to that of the annual grain domestication. Even though the estimate is a great improvement
from previous reports (~15-400 Kya) (7–9), a 2,500 to 3000-year time gap between genomic and
archaeological findings remains unresolved. There may be a few reasons that this gap exists. On
the one hand, the use of grapevine seed morphology only provides indirect evidence of
domestication. This is in contrast to the seeds of grains, where an increase in seed size or the loss
of shattering on archaeological samples provide more straightforward evidence of domestication.
In addition, binary categorization of the seed shape misses the information on the intermediate
state. Both factors could lead to an underestimate of the grapevine domestication time from the
archaeological remains. On the other hand, model-based genomic inference relies on the choice of
many parameters, which may introduce uncertainties. (1) Generation time: if grapevines had a
shorter generation time than the juvenile period of three years in the past (fig. S26), the
domestication time could be overestimated. (2) Ghost progenitor populations: though the inference
does not support the existence of such populations (fig. S27), the domestication time would have
been revised down if future archaeological evidence of an extinct progenitor population would
emerge. With these said, we could try to resolve the time gap with paleogenomic data in the future.

16
Fig. S1. The genome assembly of a V. sylvestris accession ‘VS-1’. (A) Pseudo-chromosomes of
the VS-1 genome assembly. Numbers corresponds to the chromosome number used in the V.
vinifera genome assembly PN40024 (12X.v2). (B) Syntenic relationship between the VS-1
genome assembly and PN40024 (12X.v2). (C) Comparison of the anchored chromosome lengths
in the VS-1 and PN40024 (12X.v2) genome assemblies.

17
A 19 Chr 1 B 108
2 SNP Indel total
18 V. vinifera
SNP 3
V. sylvestris
Indel
17

106
𝜋

4
16

Count
104
15

6
102
14

7
13

8 100
ic ic g ic ic c g c
12
on n in R n on ni in R ni
9
Ex tro lic UT ge tro l ic UT ge
11 In Sp er Ex In Sp er
10
Int I n t

C 0.8
MAF interval 0.01 MAF interval 0.05 Total
Coding
Noncoding
0.6
Synonymous
Nonsynonymous
Fourfold Degenerate
Frequency

0.4 Deleterious

0.2

0.0
1]

5]

5]

3]

5]

4]

5]

]
1]

2]

3]

.5
04

05

0.

.1

0.

.2

0.

.3

0.

.4
.0

0
0.

0.

0.

0.

5,

,0

5,

,0

5,

,0

5,

,0

5,
,0

1,

2,

3,

4,

.0

.1

.1

.2

.2

.3

.3

.4

.4
(0

.0

.0

.0

.0

(0

(0

(0

(0

(0

(0

(0

(0

(0
(0

(0

(0

(0

MAF

D 0.4 E 1.0
MAF interval 0.01 MAF interval 0.05

0.8
0.3
Frequency

0.6
Frequency

0.2
0.4

0.1
0.2

0.0 0.0
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 1] 2] 3] 4] 5] .1] 5] .2] 5] .3] 5] .4] 5] .5]
0.0 0.0 0.0 0.0 0.0 ,0 0.1 ,0 0.2 ,0 0.3 ,0 0.4 ,0
(0, .01, .02, .03, .04, (0.05 (0.1, (0.15 (0.2, (0.25 (0.3, (0.35 (0.4, (0.45
Size (bp) (0 (0 (0 (0
MAF

Fig. S2. Characterization of SNPs and small Indels from 3,648 V. sylvestris and V. vinifera
accessions. (A) Density plot of SNPs, small Indels (<40 bp), and nucleotide diversity (π) across
19 chromosomes of the VS-1 genome. (B) Tabulation of SNPs and small Indels according to the
different locations in the genome. (C) Frequency spectrum of SNPs according to the minor allele
frequency brackets and functional annotation. (D) Size frequency of small Indels in the genome.
(E) Frequency spectrum of small Indels according to the minor allele frequency brackets.

18
Fig. S3. Identification of core V. sylvestris and V. vinifera accessions in the total sample
cohort. (A) Schematic flowchart for the acquirement of 2,448 core V. sylvestris and V. vinifera
accessions from the total cohort. (B) Identification of clonal, close-cross (e.g., backcross),
parent-offspring, and full sibling relationships among 3,525 accessions according to identity-by-
state (IBS) sharing patterns. The majority of clonal relationships are among V. vinifera
individuals and shared by less than five accessions. PO, parent offspring; FS, full sibling; IBS,
identity-by-state. (C) Categorization of core accessions according to the major viticultural
regions. W. Asia, Western Asia; E. Asia, Eastern Asia; Rest. World, Rest of World; C. Asia,
Central Asia; Rus/Ukr, Russia/Ukraine; E. Euro, East Europe; C. Euro, Central Europe; W. Euro,
West Europe.

19
A 0.04 W. Asia Caucasus Balkan Rus/Ukr
Rus/Ukr E. Euro
Iberia Turkey C. Asia
C. Asia Turkey E. Euro
Caucasus
C. Euro 0.025
0.02 W. Euro
Iberia

Rest.
World

PC 3 (1.26%)
0.000
0.00
PC 2 (1.71%)

W. Asia
Balkan C. Euro Italy
Maghreb Maghreb
-0.025
-0.02 Italy W. Euro

sia
E. A
World
-0.050 E. Asia Rest.
-0.04

V. vinifera
V. sylvestris
-0.06 -0.075

-0.025 0.000 0.025 -0.06 -0.04 -0.02 0.00 0.02 0.04


PC 1 (7.56%) PC 2 (1.71%)
B 0.04

0.025
0.02

PC 3 (1.26%)
0.000
0.00
PC 2 (1.71%)

-0.02 -0.025

-0.04 -0.050

Table Table/Wine
Wine Raisin/other
-0.06 -0.075

-0.025 0.000 0.025 -0.06 -0.04 -0.02 0.00 0.02 0.04

PC 1 (7.56%) PC 2 (1.71%)
C 0.04

0.025
0.02

0.000
PC 3 (1.26%)

0.00
PC 2 (1.71%)

-0.025
-0.02

-0.050
-0.04 Syl-E1 Syl-W1
Syl-E2 Syl-W2

CG1 CG2
CG3 CG4
-0.06 CG5 CG6 -0.075

-0.025 0.000 0.025 -0.06 -0.04 -0.02 0.00 0.02 0.04

PC 1 (7.56%) PC 2 (1.71%)

Fig. S4. Principal component analyses of 2,448 core grapevine accessions. The projections
are colored according to major viticultural regions (A), grapevine utilization (B), and major
grapevine groups (C). The large square and circle in (A) represent the median positions.
Uncategorized and admixed accessions are greyed out.

20
A Caucasus B
C. Asia

Rus/Ukr Turkey
0.05 0.05

Balkan Maghreb

E. Euro
PC 2 (1.57%)

PC 2 (1.57%)
0.00 0.00

Iberia sia
Italy E. A
World
-0.05
Rest. -0.05 Syl-E1 Syl-W1
W. Euro Syl-E2 Syl-W2
C. Euro CG1 CG2
V. vinifera CG3 CG4
V. sylvestris
W. Asia
CG5 CG6

0.00 0.05 0.00 0.05


PC 1 (8.16%) PC 1 (8.16%)

Fig. S5. Principal component analyses of 2,448 core grapevine accessions. The principal
component analysis was performed on V. sylvestris accessions and V. vinifera accessions were
projected onto the graph. The projections are colored according to major viticultural regions (A)
and major grapevine groups (B). Uncategorized and admixed accessions are greyed out.

21
A

1st Cluster

Clade I: Mainly Table Use Clade II: Mainly Wine Use

1st Cluster

2nd Cluster

TBE Bootstrap ≥ 0.70


Branch Color: V. vinifera / V. sylvestris

0.05 0.05
2nd Cluster

Viticultural Region Color Code

Western Asia Turkey Balkan Caucasus


C. Asia Eastern Asia Italy Iberia
Rus/Ukr Maghreb E. Euro C. Euro
Rest of World W. Euro

Clade I Clade I Clade II Clade II


B 1st Cluster 2nd Cluster 1st Cluster 2nd Cluster
Raisin

Ta
/Other

bl
e /W
in
e

Wine Table

Fig. S6. Maximum likelihood phylogenetic tree of 2,448 core grapevine accessions. (A)
Circular presentation of the maximum likelihood phylogenetic tree with 100 TBE bootstraps.
Two major clades are zoomed-in. Each clade contains two smaller clusters. V. sylvestris from
Western Asia is located in the clade with a majority of table grapes. V. sylvestris from Caucasus
and the rest of Europe is located in the clade with a majority of wine grapes. Stars show TBE
values greater than 0.70. Small dark circles and blue circles in the zoomed-in clades represent
clasped accessions for clarity. (B) The proportion of table, wine, table/wine, and other types of
grapevines in each cluster. C. Asia, Central Asia; E. Euro, East Europe; C. Euro, Central Europe;
W. Euro, West Europe.

22
Cl
e r2 u
st st
u er
Cl 1

er 4
Clust
Clu
ste
r3

0.01
Region Color Code

W. Asia Turkey
C. Asia E. Asia
Rus/Ukr Maghreb
Rest of World Caucasus
Balkan Iberia
Italy C. Euro
E. Euro W. Euro

ter 5
V. sylvestris V. vinifera Clus

Fig. S7. Reticulate phylogenetic network of 2,448 core grapevine accessions. The accessions
are colored according to the major viticulture regions. A total of five major clusters could be
identified. Cluster 1 contains V. sylvestris from the Western Asia and major table grapevines.
Cluster 2 contains V. sylvestris from the Caucasus and major Caucasian wine grapevines. Cluster
3 contains a majority of European wine grapevines. Cluster 4 contains mostly western wine
grapevines. Cluster 5 contains V. sylvestris from the rest of the west Eurasian continent.

23
A V. sylvestris V. vinifera
1.0

0.8

0.6

K=2 0.4

0.2

0.0
1.0

0.8

0.6

K=3 0.4

0.2

0.0
1.0

0.8

0.6

K=4 0.4

0.2

0.0
1.0

0.8

0.6

K=5 0.4

0.2

0.0
1.0

0.8

0.6

K=6 0.4

0.2

0.0
1.0

0.8

0.6

K=7 0.4

0.2

0.0
1.0

0.8

0.6

K=8 0.4

0.2

0.0

Syl-W1 Syl-W2 Syl-E2Syl-E1 CG1 CG2 CG3 CG4 CG5 CG6

Western Eastern
Admixed Major groups Admixed
Ecotype Ecotype

C K1K2 K3 K4 K5 K6 K7 K8 V. sylvestris K1K2 K3 K4 K5 K6K7K8 V. vinifera


Groups Groups
CG5
Syl-W1
CG2

B CG6
Syl-Admix1
0.31 CG3

Syl-Admix2 C-Admix1
0.30

Syl-Admix3
CG4
CV Error

0.29
Syl-Admix4

0.28
Syl-E1 CG1

0.27
Syl-Admix5
0 2 4 6 8 10 12 14
K Value C-Admix2
Syl-W2

C-Admix3
Syl-E2
Syl-Admix6
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

Fig. S8. Categorization of core accessions according to ancestry. (A) ADMIXTURE


clustering of core accessions from K=2 to 8. (B) Cross-validation error plot for the unsupervised
ADMIXTURE analysis. (C) Hierarchical clustering of ancestral components at K=8 to order and
sort core accessions. Syl-W, V. sylvestris western ecotype; Syl-E, V. sylvestris eastern ecotype;
CG, cultivated grapevine.

24
A CG1 Western Asia Table CG2 Caucasus Wine CG3 Muscat CG4 Balkan Wine CG5 Iberian Wine CG6 Western European Wine
100 100 100 100 100 100

80 80 80 80 80 80
Ancestry %

60 60 60 60 60 60

40 40 40 40 40 40

20 20 20 20 20 20

0 0 0 0 0 0
iu sie int t
an aie as ca en or a tal er c ir c as
.I. ud ch R ra Ki
si na iteli hvi ID) urg gx de
r
an Heb ced alem Vi ch lan No lan Dur
l-U Am are usa l Ka rke Ts Khik (Un mb Jin in en po urm alaf Plav Arm Bl Ta in B inot ol B
ae K fs izy Ku inis Ha nig ärt im F B en Z
Isr ad Sa K 69 at Kö ing Cr te a na lV ir A agn P ign
sw Gh
v e2 sc k e
ye
t
Bo
a o
N av qu
s Fr We Fe s e
A Mu Ca Gr
o S B

B
100 100 100 100
C 100
80 80 80 80 80
Ancestry %

Ancestry %
60 60 60 60 60

40 40 40 40 40

20 20 20 20 20

0 0 0 0 0
zi as ng ikari be g ut ra rce um
s di i la ta oe ka t
lau ana lanc Peti Peti
t in alet net
Ge anl nho lya Bul t Ka sy les Ro me Ma San zoel ats nc sc
o
h Pem lho yle ed ng Bl gB
r n ru ra
na D da ma Ha Mid Aks
ri ei Be aU Se Ho
sa a s m B
lin a Ro etit erdo urb
t u Ma ama Ardo mb ba
Da Mu Ko uem r ku Agu eke ca es
hu me ne o ari Ri si P V Co
C La Sor
Ue
z Dz F l a Dr Sz Zl at l v a
e ng
d i
Ma ans
100 100 100 M
K1 (Syl-W1)
80 80 80
K2 (Syl-E1; CG1)
Ancestry %

60 60 60 K3 (CG6) D Other Ks
0% 100%
Table
40 40 40 K4 (CG4)
Table/Wine
K5 (CG3)
20 20 20 Wine
K6 (Syl-E2; CG2) 20% 80% Raisin/Other
0 0 0
is ire r
eu ane
s
ell
e
ro
t or
e ne ge
r
ine ine ine t K7 (CG5) Unknown
as No hm og an gev gev
’O te Oh ad Früh re D sc Or ele on
e D r 17livet uA sc ier n n ad rm
e t u o Mu tline
r
P e Ga r D’ e A e A n M Cle K8 (Syl-W2)
ri
Va Toz
e O u rB el a int te d
F l eu elein elein erli de 40% 60%
me V S et at Mad d Ob
Ah air sc Ma
Cl Mu
100 100 100

80 80 80 60% 40%
Ancestry %

60 60 60

40 40 40 80% 20%

20 20 20

0 0 0 100% 0%
sa ss no nde oa dura int
o tes
go
s s
rto prin
g izi K5
Va
ldo
ee
dle
Ri
oja a
Gr ina eL
isb ja Ar Fon za u Vi ffs eK
irm K2 (Red) 0% 20% 40% 60% 80% 100%
h S ents iolla uan Tra ta iS r
cs Neg int O emr (Purple)
s ad Tin e
Bl
u rr Cr anj int P rm Alg
To S T
Fu

Fig. S9. V. vinifera accessions according to ancestry. (A) Representative cultivars from the six
V. vinifera groups (CG1-CG6) with pure or close to pure ancestries. (B) Representative admixed
V. vinifera cultivars with two major ancestry sources. (C) Representative admixed accessions
with a sizeable wild western ecotype component (sky blue Syl-W1 and pink Syl-W2). (D) Tri-
plot of V. vinifera cultivars according to the proportions of K2, K5, and the other Ks, showing K2
and K5 ancestries are associated with table grapevines and all other ancestries with wine
grapevines. Panels A, B, an C share the same ancestry color scheme. Syl-W, V. sylvestris
western ecotype; Syl-E, V. sylvestris eastern ecotype; CG, cultivated grapevine.

25
Fig. S10. Categorization of core accessions according to archetypal analysis. The graphs
showing the projections of grapevine accessions with different numbers of archetypes (K=3 to
10). Eight archetypes can differentiate major grapevine ancestries obtained from the
ADMIXTURE analysis. Higher archetypes at K=9 and K=10 show overfitting and the mixture of
CG4 and CG5 accessions. Uncategorized and admixed accessions are greyed out.

26
A B
0.020

Nucleotide Diversity (𝜋)


0.015 b c c d c d d d
a e
0 25 .20 .15 .10 .05 0
CG1 0.3 0. 0 0 0 0 0.010
Fst
CG2 0.005
CG3
0.000
CG4
-0.005
CG5 Syl-W1 Syl-W2 Syl-E2 Syl-E1 CG1 CG2 CG3 CG4 CG5 CG6
CG6
C 0.4 c

Syl-W1 d
a b b c c
b b b
Syl-W2 0.3

Heterozygosity
Syl-E2
0.2
Syl-E1

0.1
4

Sy 1

Sy 2
1
1

3
2

Sy 2
G

l-W

l-E

l-E
G

G
G

l-W
C
C

C
C

C
Sy

0.0
Syl-W1 Syl-W2 Syl-E2 Syl-E1 CG1 CG2 CG3 CG4 CG5 CG6

Fig. S11. Genetic diversity of major grapevine groups with distinct ancestry. (A) Pairwise
fixation index FST of major grapevine groups. Yellow color represents larger population
differentiation. Two red boxes show that CG1 is closer to Syl-E1 and CG2 is closer to Syl-E2.
(B) Nucleotide diversity (π, 100 kb window size) distribution of major grapevine groups. (C)
Individual heterozygosity distribution of major grapevine groups. Solid and dashed lines
represent median and interquartile range. White diamonds represent mean values. For mean
comparisons, P<0.05 for a<b<e<c<d from Brown-Forsythe and Welch ANOVA test with
Games-Howell post hoc multiple comparisons. Graph drawn according to the ancestry color
palette. Syl-W, V. sylvestris western ecotype; Syl-E, V. sylvestris eastern ecotype; CG, cultivated
grapevine.

27
A B
0.3
0.4

Syl-W1 Syl-E1 CG1 CG4


Syl-W2 Syl-E2 CG2 CG5
CG3 CG6
0.3
0.2

0.2

r2
r2

0.1

0.1

0.0 0.0

0 5 10 15 20 0 5 10 15 20
Distance (Kb) Distance (Kb)

D
0.17
0.4
V. vinifera
V. sylvestris
Syl-W1
Total
0.14
0.3

Syl-W2
r2 at 1Kb

CG3
0.2 0.11
r2

CG5

CG6
CG4 Syl-E2

0.1 0.08 CG2


CG1 Syl-E1

0.0
0.05
0 20 40 60 80 100 120 140 160 180 200 3.5×10-3 4.5×10-3 5.5×10-3 6.5×10-3
Distance (Kb) Nucleotide Diversity (𝜋)

Fig. S12. Linkage disequilibrium in the major grapevine groups. Linkage disequilibrium
(LD, r2) decay of V. sylvestris (A) and V. vinifera (B) major groups both show that grapes of the
Western Asia (red lines) and Caucasian (teal lines) descents have the smallest LD extents at
around 400 – 500 bp. (C) LD decay of V. sylvestris is only slightly slower than that of V.
vinifera. (D) Inverse correlation of LD at 1 Kb and nucleotide diversity (π) from major grapevine
groups. Graph drawn according to the ancestry color palette. Syl-W, V. sylvestris western
ecotype; Syl-E, V. sylvestris eastern ecotype; CG, cultivated grapevine.

28
A 107 B 1.0
Syl-W1 Syl-W2
Syl-W1/ Syl-W2
Syl-E1 Syl-E2 0.8 Syl-E1/ Syl-E2

106
0.6

RCCR
Ne

105 0.4 Syl-E1/ Syl-W1


Syl-E1/ Syl-W2
Syl-E2/ Syl-W1
Syl-E2/ Syl-W2
0.2
104
0.0
103 104 105 106 107 103 104 105 106
Years (g=3, µ=5.4×10-9) Years (g=3, µ=5.4×10-9)

2000 Syl-E1 2000 Syl-E2


1000 1000
800 800
600 600
400 400

Ne (×103)
Ne (×103)

200 200

100 100
80 80
60 60
40 40

20 20

10 10
8 8
6 6
1

6
8
10

20

40

60
80
0

10 0
00
1

6
8
10

20

40

60
80
0

10 0
00

10

20

40

60
80
10

20

40

60
80

(×103 years ago) (×103 years ago)

2000 2000
Syl-W1 Syl-W2
1000 1000
800 800
600 600
400 400

200 200
Ne (×103)
Ne (×103)

100 100
80 80
60 60
40 40

20 20

10 10
8 8
6 6
4 4
1

6
8
10

20

40

60
80
0

10 0
00

6
8
10

20

40

60
80
0

10 0
00
10

20

40

60
80

10

20

40

60
80
(×103 years ago) (×103 years ago)

Fig. S13. Demographic history of V. sylvestris grapevines. (A) Representative demographic


histories of V. sylvestris populations from 107 to 103 years ago deduced from MSMC2. Each line
shows estimation from eight haplotypes of four accessions. (B) Representative split lines among
V. sylvestris populations based on relative cross- coalescence rate (RCCR) analyses from
MSMC2. (C) Demographic histories of V. sylvestris populations deduced from Stairway Plot 2.
Red line: median of 200 inferences. Black line: 75% confidence interval. Grey line: 95%
confidence interval. Syl-W, V. sylvestris western ecotype; Syl-E, V. sylvestris eastern ecotype;
CG, cultivated grapevine.

29
Fig. S14. Ecological niche modeling of the suitable habitats for V. sylvestris ecotypes. The
times are at the Pleistocene Last Interglacial (~130 Kya), the Last Glacial Maximum (~21 Kya),
and early Holocene (~11.7-8.3 Kya). The color scale shows suitability score.

30
A Model 1: Dual origin Model 2: Single origin from Syl-E1 Model 3: Single origin from Syl-E2

Domestication

Syl-E1 CG1 CG2 Syl-E2 Syl-E1 CG2 CG1 Syl-E2 Syl-E1 CG2 CG1 Syl-E2

n=5 runs AIC: 82834241.44±17.11 AIC: 84987179.61±3.72 AIC: 83196848.50±9.52

Model 1: Dual origin Model 2: Dual origin Model 3: Dual origin Model 4: Dual origin
B CG1 to CG2 CG2 to CG1 Syl-E1 to CG2 Syl-E2 to CG1

Syl-E1 CG1 CG2 Syl-E2 Syl-E1 CG1 CG2 Syl-E2 Syl-E1 CG1 CG2 Syl-E2 Syl-E1 CG1 CG2 Syl-E2
n=5 runs AIC: 82691092.94 ± 24.77 AIC: 82141082.18 ± 151.98 AIC: 82824177.83 ± 6.34 AIC: 82152522.60 ± 17.81

Model 5: Single origin from Syl-E1 Model 6: Single origin from Syl-E1 Model 7: Single origin from Syl-E2 Model 8: Single origin from Syl-E2
Syl-E2 to CG2 Syl-E2 to CG1 Syl-E1 to CG1 Syl-E1 to CG2

Syl-E1 CG2 CG1 Syl-E2 Syl-E1 CG2 CG1 Syl-E2 Syl-E1 CG2 CG1 Syl-E2 Syl-E1 CG2 CG1 Syl-E2
n=5 runs AIC: 83102162.81 ±3.93 AIC: 84968655.26 ± 0.41 AIC: 82147876.48 ±75.69 AIC: 83196842.76 ±1.58

C
1.70 1.67 1.68
1.69 CG2
1.67
1.68 1.66 CG4
CG4CG5 Syl-E2
f3 (Syl-W2,X; Rotund)

f3 (Syl-E2,X; Rotund)

CG3
f3 (CG2,X; Rotund)

1.67 CG6 CG1 1.66


Syl-W2 CG3 CG6
1.66 CG6 1.65 CG5
CG5 Syl-W1 Syl-W2
1.65 Syl-E2 1.65
CG3 Syl-E1
1.64 CG4 CG2 1.64
CG1 1.64
1.63 Syl-W1
1.62 Syl-E1 1.63
1.63
1.61
1.60 1.62 1.62
1.60 1.62 1.64 1.66 1.68 1.70 1.62 1.63 1.64 1.65 1.66 1.67 1.62 1.63 1.64 1.65 1.66 1.67 1.68
f3 (Syl-W1,X; Rotund) f3 (Syl-E1,X; Rotund) f3 (CG1,X; Rotund)

Fig. S15. Dual domestication of CG1 and CG2. (A, B) Phylogenetic model comparison
without gene flow or with one gene flow using Momi2 supports a dual origin of CG1 and CG2
with the lowest AIC values. (C) Outgroup f3 statistics biplots measuring genetic similarity
between CGs, Syl-W, and Syl-E. Rotund, Muscadinia rotundifolia. Stars mark the f3 statistics for
Syl-W1/Syl-W2, Syl-E1/Syl-E2, and CG1/CG2 pairs, respectively.

31
1.0 1.0
Syl-W1 Syl-W2
0.8 0.8

0.6 0.6
RCCR

RCCR
0.4 CG1 0.4 CG1
CG2 CG2
CG3 CG3
0.2 CG4
0.2 CG4
CG5 CG5
CG6 CG6
0.0 0.0
3 4 5 6
103 104 105 106 10 10 10 10
Years (g=3, µ=5.4×10-9) Years (g=3, µ=5.4×10-9)

1.0 1.0
Syl-E1 Syl-E2
0.8 0.8

0.6 0.6
RCCR

RCCR
0.4 CG1 0.4 CG1
CG2 CG2
CG3 CG3
0.2 CG4 0.2 CG4
CG5 CG5
CG6 CG6
0.0 0.0
3 4 5 6 3 4 5 6
10 10 10 10 10 10 10 10
Years (g=3, µ=5.4×10-9) Years (g=3, µ=5.4×10-9)

Fig. S16. Population split between V. sylvestris and V. vinifera. Representative split lines
between each V. sylvestris population and all V. vinifera groups based on relative cross-
coalescence rate (RCCR) analyses from MSMC2.

32
A m=0
M. rotundifolia M. rotundifolia
m=5
1.000
0
CG6 Syl-E2 Syl-E2

Variance Explained
Syl-W2 Syl-E1 Syl-E1 99.8%

Mean L(m)±SD
Syl-W1 CG1 CG1 -5e03
0.995
Syl-E2 CG4 CG4
Migration Migration
weight CG2 weight CG6 CG6
0.5 1 -1e04
CG3 CG3 CG3
0.990
CG1 CG5 CG5
0 0
Syl-E1 CG2 CG2 -1.5e04 Likelihoods
CG5 Syl-W1 Syl-W1 % Variance
10 s.e. CG4 10 s.e. Syl-W2 0.985
Syl-W2
0.00 0.02 0.04 0.06 0.00 0.10 0.20 0 2 4 6 8 10
Drift parameter Drift parameter
m (migration edges)
50
M. r. 69.4 SE 10.1 SE
M. r.
Syl-W1 Syl-W1
40
Syl-W2 Syl-W2
Syl-E1 Syl-E1
Syl-E2 -69.4 SE 30
Syl-E2 -10.1 SE

Δm
CG1 CG1 Optimal m
CG2 CG2 20
CG3 CG3
CG4 CG4
CG5 CG5 10
CG6 CG6
Syl-W1
Syl-W2

Syl-W1
Syl-W2
Syl-E1
Syl-E2

Syl-E1
Syl-E2

0
CG1
CG2
CG3
CG4
CG5
CG6

CG1
CG2
CG3
CG4
CG5
CG6
M. r.

M. r.

0 2 4 6 8 10
m (migration edges)

B Syl-E1 m=0 0.5 Syl-E1 m=4 1.000


0
CG1 Syl-E2 99.8%

Variance Explained
0.995
CG3 Syl-W1
Mean L(m)±SD

0 -5e03
CG4 Migration Syl-W2
weight 0.990
CG5 CG2
Syl-W2 CG4 -1e04
0.985
Syl-W1 CG3
CG6 CG6 -1.5e04 0.980
CG2 CG5 Likelihoods
10 s.e. 0.975
Syl-E2 CG1 10 s.e. % Variance
-2e04
0.000 0.010 0.020 0.000 0.010 0.020
Drift parameter Drift parameter 0 2 4 6 8 10
m (migration edges)

66.4 SE 13.9 SE 80
Syl-W1 Syl-W1

Syl-W2 Syl-W2
Syl-E1 Syl-E1
60
Syl-E2 Syl-E2
-66.4 SE -13.9 SE
CG1 CG1
Δm

CG2 CG2 40
CG3 CG3
CG4 CG4
20 Optimal m
CG5 CG5

CG6 CG6
Syl-E1

Syl-E1
Syl-W1

Syl-W2

Syl-W1

Syl-W2
CG1
CG2
CG3
CG4
CG5

CG6

CG1
CG2
CG3
CG4
CG5

CG6
Syl-E2

Syl-E2

0
0 2 4 6 8 10
m (migration edges)

Fig. S17. Introgression of Syl-W and the origination of European grapevines. (A) Outgroup
is set as M. rotundifolia. TreeMix analysis with zero and five migration edges (m=5). Optimal m
number shown by the red circle. Residual matrices are shown. Five migration edges increase the
proportion of variance explained from 96.9% (m=0) to 99.9%. Overfitting of the tree due to
outgroup selection was shown by a dubious “migration” from Syl-E1 to M. rotundifolia. (B)
Outgroup is set as Syl-E1 to avoid overfitting. TreeMix analysis with zero and four migration
edges (m=4). Optimal m number shown by the red circle. Residual matrices are shown. Four
migration edges increase the proportion of variance explained from 90.2% (m=0) to 99.5%.

33
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
0MB

5MB

10MB

15MB

20MB

25MB

30MB
0 0.2 0.4 0.6 0 0.2 0.4 0.6
Density CG3 Density CG4

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 0MB
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

5MB

10MB

15MB

20MB

25MB

30MB
0 0.2 0.4 0.6 0 0.2 0.4 0.6
Density
CG5 Density
CG6

Fig. S18. Local introgression tracts of Syl-W in four V. vinifera grapevines. Color scheme
show the relative density of identified introgression tracts. Each tract contains 50 SNPs.

34
C-Admix Syl-Admix
CG1 Syl-E1
CG2 Syl-E2
CG3 Syl-W1
CG4 Syl-W2
CG5
CG6

10
1

fv haplotype

Fig. S19. Median-joining network of f and fv sex determination region haplotypes. The fv
haplotype is shown by a square. The f haplotype is shown by a circle.

35
C-Admix Syl-Admix
CG1 Syl-E1
Mv haplogroup CG2 Syl-E2
CG3 Syl-W1
CG4 Syl-W2
CG5 10
CG6 1

Fig. S20. Median-joining network of M and Mv sex determination region haplotypes. The
Mv haplotype is shown by a square. The M haplotype is shown by a circle.

36
H3 Haplotype

C-Admix Syl-Admix
CG1 Syl-E1
CG2 Syl-E2
CG3 Syl-W1
CG4 Syl-W2
CG5 10
CG6 1

Fig. S21. Median-joining network of H1 and H3 sex determination region haplotypes. The
H3 haplotype is shown by a square. The H1 haplotype is shown by a circle.

37
H2

C-Admix Syl-Admix
CG1 Syl-E1
CG2 Syl-E2
CG3 Syl-W1
CG4 Syl-W2
CG5
CG6 10
1

H4 H5

Fig. S22. Median-joining network of H2, H3, and H5 sex determination region haplotypes.

38
Number
400
f/f H1/H2
H1/f H2/f
H1/H1 Other
300

200

100

0
CG1 CG2 CG3 CG4 CG5 CG6

Fig. S23. Distribution of SDR genotypes in the six major grapevine groups.

39
Fig. S24. Grapevine group CG3 and muscat flavor. (A) Geographic distribution of CG3
grapevines. (B) Identification of SNPs associated with muscat flavor using FastGWA-GLMM.
The significance threshold is set at -log10(p)=6.0. (C) Zoomed-in genomic regions with
significant SNP signatures. Genes closest to the SNPs are colored in red. The non-synonymous
SNP Chr5:19419698 and the corresponding VvDXS gene are shown in blue.

40
A B
800 CG1 CG4 100% 800 Table 100%
CG2 CG5
Wine
CG3 CG6
Table/Wine
600 Admix 75% 600 Other
75%
Count

Count
400 50% 400 50%

200 25% 200 25%

0 0% 0 0%
w e d k w se d k w se d k ow se d k
llo Ros Re Bl
ac llo Ro Re Bl
ac ell
o
Ro Re Blac ll Ro Re lac
-Ye - Ye d- -Y d- Ye d-
B
en d n- en en
-
e Re ee Re e R e e Re
Gr G r Gr Gr
C

50 100 150
150

Observed -Log10(P)
100
-Log10(P)
50

𝜆=1.16

0
0 2 4 6 8 10
0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Expected -Log10(P)
D
150

2:3521538 A/T 2:5054627 G/T 2:16051309 T/A


50 100
-Log10(P)

2:5116947 G/T
Exonic
Non-exonic
0

(Mb) 3.512 3.516 3.520 4.98 5.00 5.02 5.04 5.06 5.08 5.10 16.05 16.07 16.09

Vvsyl02G000229 Vvsyl02G000303 Vvsyl02G000310 Vvsyl02G000314 Vvsyl02G001064


(VvMybA3) (VvMybA1) (VvMybA2)

E 2:3521538 A/T 2:16051309 T/A 2:5054627 G/T 2:5116947 G/T


100% 100%

Ref/Ref
75% 75%
Percentage

Ref/Alt
50% 50%
Alt/Alt
25% 25%
-/-

0% 0%

ow se Red lack stris w e d k


llo Ros Re Blac estr
is ow ose Red lack stris w e d k
llo Ros Re Blac estr
is
ell Ro -B ylve Ye lv Ye
ll R -B ylve Ye lv
n-Y d n - d- y - d - d- y
ee Re V. s ee Re V. s e en Re V. s ee
n
Re V. s
Gr Gr Gr Gr

Fig. S25. Novel genes associated with berry skin color. (A, B) Categorization of cultivated
grapevine according to berry skin color (green-yellow, rose, red, and red-black). No population
stratification observed for major groups and grapevine utilization. (C) Identification of SNPs
associated with berry skin color using MLMA-LOCO. The significance threshold is set at -
log10(p)=6.0. Genomic inflation factor l=1.16. The top SNP signals shown in dashed square. (D)
Zoomed-in genomic regions with the top SNP signatures in chromosome 2. Pink represents non-
exonic SNPs. Dark red represents exonic SNPs. Relevant genes closest to the SNPs are shown.
Blue and yellow blocks represent exons and introns respectively. Four representative top non-
synonymous SNPs are labeled. Alternative splicing transcripts exist for the Vvsyl02G001064
gene. (E) The association of genotypes for representative SNPs (Ref/Alt) with berry skin color.
V. sylvestris has red berries. Ref: reference allele. Alt: alternative allele.

41
Fig. S26. The impact of various generation times on the population split times inferred by
MSMC2. Estimated split times of Syl-E1/CG1 and Syl-E2/CG2 population pairs using relative
cross-coalescence rate (0.5) analyses with MSMC2. Four haplotypes in each population with 100
runs for each comparison. The comparisons were kept the same across five generation times. Red
bars, median value with 95% confidence interval.

42
Model 2: Syl-E1 related ghost
A Model 1: Syl-E1 as progenitor
population as progenitor

~11,000 Kya

~8,000 Kya

Syl-E1 CG1 Syl-E2 Syl-E1 Ghost1 CG1 Syl-E2


n=5 runs AIC: 63399249.15 ± 19.74 AIC: 63650398.33 ± 62.67

Model 2: Syl-E2 related ghost


B Model 1: Syl-E2 as progenitor
population as progenitor

~11,000 Kya

~8,000 Kya

Syl-E1 CG2 Syl-E2 Syl-E1 CG2 Ghost2 Syl-E2


n=5 runs AIC: 61301067.74 ± 2.37 AIC: 61583922.62 ± 80.41

Fig. S27. Momi2 inference of trees with and without extinct progenitor populations. The
result does not support the descendance of CG1 (A) and CG2 (B) from extinct progenitor
populations, respectively.

43
References and Notes
1. P. E. McGovern, U. Hartung, V. R. Badler, D. L. Glusker, L. J. Exner, The beginnings of
winemaking and viniculture in the ancient Near East and Egypt. Expedition 39, 3–21
(1997).
2. P. This, T. Lacombe, M. R. Thomas, Historical origins and genetic diversity of wine grapes.
Trends Genet. 22, 511–519 (2006). doi:10.1016/j.tig.2006.07.008 Medline
3. F. Grassi, G. De Lorenzis, Back to the origins: Background and perspectives of grapevine
domestication. Int. J. Mol. Sci. 22, 4518 (2021). doi:10.3390/ijms22094518 Medline
4. D. Cantu, M. A. Walker, The Grape Genome (Springer Nature, 2019);
https://doi.org/10.1007/978-3-030-18601-2.
5. D. Zohary, M. Hopf, E. Weiss, Domestication of Plants in the Old World: The Origin and
Spread of Domesticated Plants in Southwest Asia, Europe, and the Mediterranean Basin
(Oxford Univ. Press, 2012).
6. S. Myles, A. R. Boyko, C. L. Owens, P. J. Brown, F. Grassi, M. K. Aradhya, B. Prins, A.
Reynolds, J.-M. Chia, D. Ware, C. D. Bustamante, E. S. Buckler, Genetic structure and
domestication history of the grape. Proc. Natl. Acad. Sci. U.S.A. 108, 3530–3535 (2011).
doi:10.1073/pnas.1009363108 Medline
7. Y. Zhou, M. Massonnet, J. S. Sanjak, D. Cantu, B. S. Gaut, Evolutionary genomics of grape
(Vitis vinifera ssp. vinifera) domestication. Proc. Natl. Acad. Sci. U.S.A. 114, 11715–
11720 (2017). doi:10.1073/pnas.1709257114 Medline
8. Z. Liang, S. Duan, J. Sheng, S. Zhu, X. Ni, J. Shao, C. Liu, P. Nick, F. Du, P. Fan, R. Mao, Y.
Zhu, W. Deng, M. Yang, H. Huang, Y. Liu, Y. Ding, X. Liu, J. Jiang, Y. Zhu, S. Li, X.
He, W. Chen, Y. Dong, Whole-genome resequencing of 472 Vitis accessions for
grapevine diversity and demographic history analyses. Nat. Commun. 10, 1190 (2019).
doi:10.1038/s41467-019-09135-8 Medline
9. A. Sivan, O. Rahimi, B. Lavi, M. Salmon‐Divon, E. Weiss, E. Drori, S. Hübner, Genomic
evidence supports an independent history of Levantine and Eurasian grapevines. Plants
People Planet 3, 414–427 (2021). doi:10.1002/ppp3.10197
10. S. Freitas, M. A. Gazda, M. Â. Rebelo, A. J. Muñoz-Pajares, C. Vila-Viçosa, A. Muñoz-
Mérida, L. M. Gonçalves, D. Azevedo-Silva, S. Afonso, I. Castro, P. H. Castro, M.
Sottomayor, A. Beja-Pereira, J. Tereso, N. Ferrand, E. Gonçalves, A. Martins, M.
Carneiro, H. Azevedo, Pervasive hybridization with local wild relatives in Western
European grapevine varieties. Sci. Adv. 7, eabi8584 (2021). doi:10.1126/sciadv.abi8584
Medline
11. G. Magris, I. Jurman, A. Fornasiero, E. Paparelli, R. Schwope, F. Marroni, G. Di Gaspero,
M. Morgante, The genomes of 204 Vitis vinifera accessions reveal the origin of European
wine grapes. Nat. Commun. 12, 7240 (2021). doi:10.1038/s41467-021-27487-y Medline
12. S. Riaz, G. De Lorenzis, D. Velasco, A. Koehmstedt, D. Maghradze, Z. Bobokashvili, M.
Musayev, G. Zdunic, V. Laucou, M. Andrew Walker, O. Failla, J. E. Preece, M. Aradhya,
R. Arroyo-Garcia, Genetic diversity analysis of cultivated and wild grapevine (Vitis
vinifera L.) accessions around the Mediterranean basin and Central Asia. BMC Plant
Biol. 18, 137 (2018). doi:10.1186/s12870-018-1351-0 Medline
44
13. R. Arroyo-García, L. Ruiz-García, L. Bolling, R. Ocete, M. A. López, C. Arnold, A. Ergul,
G. Söylemezoğlu, H. I. Uzun, F. Cabello, J. Ibáñez, M. K. Aradhya, A. Atanassov, I.
Atanassov, S. Balint, J. L. Cenis, L. Costantini, S. Goris-Lavets, M. S. Grando, B. Y.
Klein, P. E. McGovern, D. Merdinoglu, I. Pejic, F. Pelsy, N. Primikirios, V.
Risovannaya, K. A. Roubelakis-Angelakis, H. Snoussi, P. Sotiri, S. Tamhankar, P. This,
L. Troshin, J. M. Malpica, F. Lefort, J. M. Martinez-Zapater, Multiple origins of
cultivated grapevine (Vitis vinifera L. ssp. sativa) based on chloroplast DNA
polymorphisms. Mol. Ecol. 15, 3707–3714 (2006). doi:10.1111/j.1365-
294X.2006.03049.x Medline
14. P. McGovern, M. Jalabadze, S. Batiuk, M. P. Callahan, K. E. Smith, G. R. Hall, E.
Kvavadze, D. Maghradze, N. Rusishvili, L. Bouby, O. Failla, G. Cola, L. Mariani, E.
Boaretto, R. Bacilieri, P. This, N. Wales, D. Lordkipanidze, Early Neolithic wine of
Georgia in the South Caucasus. Proc. Natl. Acad. Sci. U.S.A. 114, E10309–E10318
(2017). doi:10.1073/pnas.1714728114 Medline
15. J. Ramos-Madrigal, A. K. W. Runge, L. Bouby, T. Lacombe, J. A. Samaniego Castruita, A.-
F. Adam-Blondon, I. Figueiral, C. Hallavant, J. M. Martínez-Zapater, C. Schaal, R.
Töpfer, B. Petersen, T. Sicheritz-Pontén, P. This, R. Bacilieri, M. T. P. Gilbert, N. Wales,
Palaeogenomic insights into the origins of French grapevine diversity. Nat. Plants 5,
595–603 (2019). doi:10.1038/s41477-019-0437-5 Medline
16. See the supplementary materials.
17. M. J. Roach, D. L. Johnson, J. Bohlmann, H. J. J. van Vuuren, S. J. M. Jones, I. S. Pretorius,
S. A. Schmidt, A. R. Borneman, Population sequencing reveals clonal diversity and
ancestral inbreeding in the grapevine cultivar Chardonnay. PLOS Genet. 14, e1007807
(2018). doi:10.1371/journal.pgen.1007807 Medline
18. T. Lacombe, J.-M. Boursiquot, V. Laucou, M. Di Vecchi-Staraz, J.-P. Péros, P. This, Large-
scale parentage analysis in an extended set of grapevine cultivars (Vitis vinifera L.).
Theor. Appl. Genet. 126, 401–414 (2013). doi:10.1007/s00122-012-1988-2 Medline
19. R. Bacilieri, T. Lacombe, L. Le Cunff, M. Di Vecchi-Staraz, V. Laucou, B. Genna, J.-P.
Péros, P. This, J.-M. Boursiquot, Genetic structure in cultivated grapevines is linked to
geography and human selection. BMC Plant Biol. 13, 25–25 (2013). doi:10.1186/1471-
2229-13-25 Medline
20. F. Mercati, G. De Lorenzis, A. Mauceri, M. Zerbo, L. Brancadoro, C. D’Onofrio, C. Morcia,
M. G. Barbagallo, C. Bignami, M. Gardiman, L. de Palma, P. Ruffa, V. Novello, M.
Crespan, F. Sunseri, Integrated Bayesian Approaches Shed Light on the Dissemination
Routes of the Eurasian Grapevine Germplasm. Front. Plant Sci. 12, 692661 (2021).
doi:10.3389/fpls.2021.692661 Medline
21. R. Hosfield, J. Cole, Early hominins in north-west Europe: A punctuated long chronology?
Quat. Sci. Rev. 190, 148–160 (2018). doi:10.1016/j.quascirev.2018.04.026
22. A. Timmermann, K.-S. Yun, P. Raia, J. Ruan, A. Mondanaro, E. Zeller, C. Zollikofer, M.
Ponce de León, D. Lemmon, M. Willeit, A. Ganopolski, Climate effects on archaic
human habitats and species successions. Nature 604, 495–501 (2022).
doi:10.1038/s41586-022-04600-9 Medline

45
23. E. C. Corrick, R. N. Drysdale, J. C. Hellstrom, E. Capron, S. O. Rasmussen, X. Zhang, D.
Fleitmann, I. Couchoud, E. Wolff, Synchronous timing of abrupt climate changes during
the last glacial period. Science 369, 963–969 (2020). doi:10.1126/science.aay5538
Medline
24. M. Engel, H. Brückner, A. Pint, K. Wellbrock, A. Ginau, P. Voss, M. Grottker, N. Klasen, P.
Frenzel, The early Holocene humid period in NW Saudi Arabia – Sediments, microfossils
and palaeo-hydrological modelling. Quat. Int. 266, 131–141 (2012).
doi:10.1016/j.quaint.2011.04.028
25. C. J. Stevens, C. Murphy, R. Roberts, L. Lucas, F. Silva, D. Q. Fuller, Between China and
South Asia: A Middle Asian corridor of crop dispersal and agricultural innovation in the
Bronze Age. Holocene 26, 1541–1555 (2016). doi:10.1177/0959683616650268 Medline
26. I. Lazaridis, D. Nadel, G. Rollefson, D. C. Merrett, N. Rohland, S. Mallick, D. Fernandes, M.
Novak, B. Gamarra, K. Sirak, S. Connell, K. Stewardson, E. Harney, Q. Fu, G. Gonzalez-
Fortes, E. R. Jones, S. A. Roodenberg, G. Lengyel, F. Bocquentin, B. Gasparian, J. M.
Monge, M. Gregg, V. Eshed, A.-S. Mizrahi, C. Meiklejohn, F. Gerritsen, L. Bejenaru, M.
Blüher, A. Campbell, G. Cavalleri, D. Comas, P. Froguel, E. Gilbert, S. M. Kerr, P.
Kovacs, J. Krause, D. McGettigan, M. Merrigan, D. A. Merriwether, S. O’Reilly, M. B.
Richards, O. Semino, M. Shamoon-Pour, G. Stefanescu, M. Stumvoll, A. Tönjes, A.
Torroni, J. F. Wilson, L. Yengo, N. A. Hovhannisyan, N. Patterson, R. Pinhasi, D. Reich,
Genomic insights into the origin of farming in the ancient Near East. Nature 536, 419–
424 (2016). doi:10.1038/nature19310 Medline
27. C.-C. Wang, S. Reinhold, A. Kalmykov, A. Wissgott, G. Brandt, C. Jeong, O. Cheronet, M.
Ferry, E. Harney, D. Keating, S. Mallick, N. Rohland, K. Stewardson, A. R. Kantorovich,
V. E. Maslov, V. G. Petrenko, V. R. Erlikh, B. Ch. Atabiev, R. G. Magomedov, P. L.
Kohl, K. W. Alt, S. L. Pichler, C. Gerling, H. Meller, B. Vardanyan, L. Yeganyan, A. D.
Rezepkin, D. Mariaschk, N. Berezina, J. Gresky, K. Fuchs, C. Knipper, S. Schiffels, E.
Balanovska, O. Balanovsky, I. Mathieson, T. Higham, Y. B. Berezin, A. Buzhilova, V.
Trifonov, R. Pinhasi, A. B. Belinskij, D. Reich, S. Hansen, J. Krause, W. Haak, Ancient
human genome-wide data from a 3000-year interval in the Caucasus corresponds with
eco-geographic regions. Nat. Commun. 10, 590 (2019). doi:10.1038/s41467-018-08220-8
Medline
28. R. Pinhasi, J. Fort, A. J. Ammerman, Tracing the origin and spread of agriculture in Europe.
PLOS Biol. 3, e410 (2005). doi:10.1371/journal.pbio.0030410 Medline
29. I. Mathieson, S. Alpaslan-Roodenberg, C. Posth, A. Szécsényi-Nagy, N. Rohland, S.
Mallick, I. Olalde, N. Broomandkhoshbacht, F. Candilio, O. Cheronet, D. Fernandes, M.
Ferry, B. Gamarra, G. G. Fortes, W. Haak, E. Harney, E. Jones, D. Keating, B. Krause-
Kyora, I. Kucukkalipci, M. Michel, A. Mittnik, K. Nägele, M. Novak, J. Oppenheimer,
N. Patterson, S. Pfrengle, K. Sirak, K. Stewardson, S. Vai, S. Alexandrov, K. W. Alt, R.
Andreescu, D. Antonović, A. Ash, N. Atanassova, K. Bacvarov, M. B. Gusztáv, H.
Bocherens, M. Bolus, A. Boroneanţ, Y. Boyadzhiev, A. Budnik, J. Burmaz, S.
Chohadzhiev, N. J. Conard, R. Cottiaux, M. Čuka, C. Cupillard, D. G. Drucker, N.
Elenski, M. Francken, B. Galabova, G. Ganetsovski, B. Gély, T. Hajdu, V. Handzhyiska,
K. Harvati, T. Higham, S. Iliev, I. Janković, I. Karavanić, D. J. Kennett, D. Komšo, A.
Kozak, D. Labuda, M. Lari, C. Lazar, M. Leppek, K. Leshtakov, D. L. Vetro, D. Los, I.
Lozanov, M. Malina, F. Martini, K. McSweeney, H. Meller, M. Menđušić, P. Mirea, V.
46
Moiseyev, V. Petrova, T. D. Price, A. Simalcsik, L. Sineo, M. Šlaus, V. Slavchev, P.
Stanev, A. Starović, T. Szeniczey, S. Talamo, M. Teschler-Nicola, C. Thevenet, I.
Valchev, F. Valentin, S. Vasilyev, F. Veljanovska, S. Venelinova, E. Veselovskaya, B.
Viola, C. Virag, J. Zaninović, S. Zäuner, P. W. Stockhammer, G. Catalano, R. Krauß, D.
Caramelli, G. Zariņa, B. Gaydarska, M. Lillie, A. G. Nikitin, I. Potekhina, A.
Papathanasiou, D. Borić, C. Bonsall, J. Krause, R. Pinhasi, D. Reich, The genomic
history of southeastern Europe. Nature 555, 197–203 (2018). doi:10.1038/nature25778
Medline
30. R. Fregel, F. L. Méndez, Y. Bokbot, D. Martín-Socas, M. D. Camalich-Massieu, J. Santana,
J. Morales, M. C. Ávila-Arcos, P. A. Underhill, B. Shapiro, G. Wojcik, M. Rasmussen,
A. E. R. Soares, J. Kapp, A. Sockell, F. J. Rodríguez-Santos, A. Mikdad, A. Trujillo-
Mederos, C. D. Bustamante, Ancient genomes from North Africa evidence prehistoric
migrations to the Maghreb from both the Levant and Europe. Proc. Natl. Acad. Sci.
U.S.A. 115, 6774–6779 (2018). doi:10.1073/pnas.1800851115 Medline
31. I. Olalde, S. Mallick, N. Patterson, N. Rohland, V. Villalba-Mouco, M. Silva, K. Dulias, C. J.
Edwards, F. Gandini, M. Pala, P. Soares, M. Ferrando-Bernal, N. Adamski, N.
Broomandkhoshbacht, O. Cheronet, B. J. Culleton, D. Fernandes, A. M. Lawson, M.
Mah, J. Oppenheimer, K. Stewardson, Z. Zhang, J. M. Jiménez Arenas, I. J. Toro
Moyano, D. C. Salazar-García, P. Castanyer, M. Santos, J. Tremoleda, M. Lozano, P.
García Borja, J. Fernández-Eraso, J. A. Mujika-Alustiza, C. Barroso, F. J. Bermúdez, E.
Viguera Mínguez, J. Burch, N. Coromina, D. Vivó, A. Cebrià, J. M. Fullola, O. García-
Puchol, J. I. Morales, F. X. Oms, T. Majó, J. M. Vergès, A. Díaz-Carvajal, I. Ollich-
Castanyer, F. J. López-Cachero, A. M. Silva, C. Alonso-Fernández, G. Delibes de Castro,
J. Jiménez Echevarría, A. Moreno-Márquez, G. Pascual Berlanga, P. Ramos-García, J.
Ramos-Muñoz, E. Vijande Vila, G. Aguilella Arzo, Á. Esparza Arroyo, K. T. Lillios, J.
Mack, J. Velasco-Vázquez, A. Waterman, L. Benítez de Lugo Enrich, M. Benito
Sánchez, B. Agustí, F. Codina, G. de Prado, A. Estalrrich, Á. Fernández Flores, C.
Finlayson, G. Finlayson, S. Finlayson, F. Giles-Guzmán, A. Rosas, V. Barciela González,
G. García Atiénzar, M. S. Hernández Pérez, A. Llanos, Y. Carrión Marco, I. Collado
Beneyto, D. López-Serrano, M. Sanz Tormo, A. C. Valera, C. Blasco, C. Liesau, P. Ríos,
J. Daura, M. J. de Pedro Michó, A. A. Diez-Castillo, R. Flores Fernández, J. Francès
Farré, R. Garrido-Pena, V. S. Gonçalves, E. Guerra-Doce, A. M. Herrero-Corral, J. Juan-
Cabanilles, D. López-Reyes, S. B. McClure, M. Merino Pérez, A. Oliver Foix, M. Sanz
Borràs, A. C. Sousa, J. M. Vidal Encinas, D. J. Kennett, M. B. Richards, K. Werner Alt,
W. Haak, R. Pinhasi, C. Lalueza-Fox, D. Reich, The genomic history of the Iberian
Peninsula over the past 8000 years. Science 363, 1230–1234 (2019).
doi:10.1126/science.aav4040 Medline
32. S. Brunel, E. A. Bennett, L. Cardin, D. Garraud, H. Barrand Emam, A. Beylier, B. Boulestin,
F. Chenal, E. Ciesielski, F. Convertini, B. Dedet, S. Desbrosse-Degobertiere, S. Desenne,
J. Dubouloz, H. Duday, G. Escalon, V. Fabre, E. Gailledrat, M. Gandelin, Y. Gleize, S.
Goepfert, J. Guilaine, L. Hachem, M. Ilett, F. Lambach, F. Maziere, B. Perrin, S. Plouin,
E. Pinard, I. Praud, I. Richard, V. Riquier, R. Roure, B. Sendra, C. Thevenet, S. Thiol, E.
Vauquelin, L. Vergnaud, T. Grange, E.-M. Geigl, M. Pruvost, Ancient genomes from
present-day France unveil 7,000 years of its demographic history. Proc. Natl. Acad. Sci.
U.S.A. 117, 12791–12798 (2020). doi:10.1073/pnas.1918034117 Medline

47
33. C. Zou, M. Massonnet, A. Minio, S. Patel, V. Llaca, A. Karn, F. Gouker, L. Cadle-Davidson,
B. Reisch, A. Fennell, D. Cantu, Q. Sun, J. P. Londo, Multiple independent
recombinations led to hermaphroditism in grapevine. Proc. Natl. Acad. Sci. U.S.A. 118,
e2023548118 (2021). doi:10.1073/pnas.2023548118 Medline
34. F. Emanuelli, J. Battilana, L. Costantini, L. Le Cunff, J.-M. Boursiquot, P. This, M. S.
Grando, A candidate gene association study on muscat flavor in grapevine (Vitis vinifera
L.). BMC Plant Biol. 10, 241–241 (2010). doi:10.1186/1471-2229-10-241 Medline
35. S. Kobayashi, N. Goto-Yamamoto, H. Hirochika, Retrotransposon-induced mutations in
grape skin color. Science 304, 982–982 (2004). doi:10.1126/science.1095011 Medline
36. A. R. Walker, E. Lee, J. Bogs, D. A. J. McDavid, M. R. Thomas, S. P. Robinson, White
grapes arose through the mutation of two similar and adjacent regulatory genes. Plant J.
49, 772–785 (2007). doi:10.1111/j.1365-313X.2006.02997.x Medline
37. A. R. Walker, E. Lee, S. P. Robinson, Two new grape cultivars, bud sports of Cabernet
Sauvignon bearing pale-coloured berries, are the result of deletion of two regulatory
genes of the berry colour locus. Plant Mol. Biol. 62, 623–635 (2006).
doi:10.1007/s11103-006-9043-9 Medline
38. P. J. Richerson, R. Boyd, R. L. Bettinger, Was Agriculture Impossible during the Pleistocene
but Mandatory during the Holocene? A Climate Change Hypothesis. Am. Antiq. 66, 387–
411 (2001). doi:10.2307/2694241
39. R. G. Allaby, C. J. Stevens, L. Kistler, D. Q. Fuller, Emerging evidence of plant
domestication as a landscape-level process. Trends Ecol. Evol. 37, 268–279 (2022).
doi:10.1016/j.tree.2021.11.002 Medline
40. R. S. Meyer, M. D. Purugganan, Evolution of crop species: Genetics of domestication and
diversification. Nat. Rev. Genet. 14, 840–852 (2013). doi:10.1038/nrg3605 Medline
41. Code for: Y. Dong, S. Duan, Q. Xia, Z. Liang, X. Dong, K. Margaryan, M. Musayev, S.
Goryslavets, G. Zdunić, P.-F. Bert, T. Lacombe, E. Maul, P. Nick, K. Bitskinashvili, G.
D. Bisztray, E. Drori, G. De Lorenzis, J. Cunha, C. F. Popescu, R. Arroyo-Garcia, C.
Arnold, A. Ergül, Y. Zhu, C. Ma, S. Wang, S. Liu, L. Tang, C. Wang, D. Li, Y. Pan, J.
Li, L. Yang, X. Li, G. Xiang, Z. Yang, B. Chen, Z. Dai, Y. Wang, A. Arakelyan, V.
Kuliyev, G. Spotar, N. Girollet, S. Delrot, N. Ollat, P. This, C. Marchal, G. Sarah, V.
Laucou, R. Bacilieri, F. Röckel, P. Guan, A. Jung, M. Riemann, L. Ujmajuridze, T.
Zakalashvili, D. Maghradze, M. Höhn, G. Jahnke, E. Kiss, T. Deák, O. Rahimi, S.
Hübner, F. Grassi, F. Mercati, F. Sunseri, J. Eiras-Dias, A. M. Dumitru, D. Carrasco, A.
Rodriguez-Izquierdo, G. Muñoz, T. Uysal, C. Özer, K. Kazan, M. Xu, Y. Wang, S. Zhu,
J. Lu, M. Zhao, L. Wang, S. Jiu, Y. Zhang, L. Sun, H. Yang, E. Weiss, S. Wang, Y. Zhu,
S. Li, J. Sheng, W. Chen, Dual domestications and origin of traits in grapevine evolution,
Zenodo (2023); https://doi.org/10.5281/zenodo.7523647.
42. M. J. Roach, S. A. Schmidt, A. R. Borneman, Purge Haplotigs: Allelic contig reassignment
for third-gen diploid genome assemblies. BMC Bioinformatics 19, 460 (2018).
doi:10.1186/s12859-018-2485-7 Medline
43. B. J. Walker, T. Abeel, T. Shea, M. Priest, A. Abouelliel, S. Sakthikumar, C. A. Cuomo, Q.
Zeng, J. Wortman, S. K. Young, A. M. Earl, Pilon: An integrated tool for comprehensive

48
microbial variant detection and genome assembly improvement. PLOS ONE 9, e112963
(2014). doi:10.1371/journal.pone.0112963 Medline
44. S. Koren, B. P. Walenz, K. Berlin, J. R. Miller, N. H. Bergman, A. M. Phillippy, Canu:
Scalable and accurate long-read assembly via adaptive k-mer weighting and repeat
separation. Genome Res. 27, 722–736 (2017). doi:10.1101/gr.215087.116 Medline
45. G. Marçais, A. L. Delcher, A. M. Phillippy, R. Coston, S. L. Salzberg, A. Zimin, MUMmer4:
A fast and versatile genome alignment system. PLOS Comput. Biol. 14, e1005944
(2018). doi:10.1371/journal.pcbi.1005944 Medline
46. J. Hu, J. Fan, Z. Sun, S. Liu, NextPolish: A fast and efficient genome polishing tool for long-
read assembly. Bioinformatics 36, 2253–2255 (2020). doi:10.1093/bioinformatics/btz891
Medline
47. N. C. Durand, M. S. Shamim, I. Machol, S. S. P. Rao, M. H. Huntley, E. S. Lander, E. L.
Aiden, Juicer Provides a One-Click System for Analyzing Loop-Resolution Hi-C
Experiments. Cell Syst. 3, 95–98 (2016). doi:10.1016/j.cels.2016.07.002 Medline
48. O. Dudchenko, S. S. Batra, A. D. Omer, S. K. Nyquist, M. Hoeger, N. C. Durand, M. S.
Shamim, I. Machol, E. S. Lander, A. P. Aiden, E. L. Aiden, De novo assembly of the
Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science 356, 92–
95 (2017). doi:10.1126/science.aal3327 Medline
49. F. A. Simão, R. M. Waterhouse, P. Ioannidis, E. V. Kriventseva, E. M. Zdobnov, BUSCO:
Assessing genome assembly and annotation completeness with single-copy orthologs.
Bioinformatics 31, 3210–3212 (2015). doi:10.1093/bioinformatics/btv351 Medline
50. O. Jaillon, J.-M. Aury, B. Noel, A. Policriti, C. Clepet, A. Casagrande, N. Choisne, S.
Aubourg, N. Vitulo, C. Jubin, A. Vezzi, F. Legeai, P. Hugueney, C. Dasilva, D. Horner,
E. Mica, D. Jublot, J. Poulain, C. Bruyère, A. Billault, B. Segurens, M. Gouyvenoux, E.
Ugarte, F. Cattonaro, V. Anthouard, V. Vico, C. Del Fabbro, M. Alaux, G. Di Gaspero,
V. Dumas, N. Felice, S. Paillard, I. Juman, M. Moroldo, S. Scalabrin, A. Canaguier, I. Le
Clainche, G. Malacrida, E. Durand, G. Pesole, V. Laucou, P. Chatelet, D. Merdinoglu, M.
Delledonne, M. Pezzotti, A. Lecharny, C. Scarpelli, F. Artiguenave, M. E. Pè, G. Valle,
M. Morgante, M. Caboche, A.-F. Adam-Blondon, J. Weissenbach, F. Quétier, P.
Wincker; French-Italian Public Consortium for Grapevine Genome Characterization, The
grapevine genome sequence suggests ancestral hexaploidization in major angiosperm
phyla. Nature 449, 463–467 (2007). doi:10.1038/nature06148 Medline
51. A. Canaguier, J. Grimplet, G. Di Gaspero, S. Scalabrin, E. Duchêne, N. Choisne, N.
Mohellibi, C. Guichard, S. Rombauts, I. Le Clainche, A. Bérard, A. Chauveau, R.
Bounon, C. Rustenholz, M. Morgante, M. C. Le Paslier, D. Brunel, A.-F. Adam-Blondon,
A new version of the grapevine reference genome assembly (12X.v2) and of its
annotation (VCost.v3). Genom. Data 14, 56–62 (2017). doi:10.1016/j.gdata.2017.09.002
Medline
52. Y. Zhou, A. Minio, M. Massonnet, E. Solares, Y. Lv, T. Beridze, D. Cantu, B. S. Gaut, The
population genetics of structural variants in grapevine domestication. Nat. Plants 5, 965–
979 (2019). doi:10.1038/s41477-019-0507-8 Medline

49
53. N. Girollet, B. Rubio, C. Lopez-Roques, S. Valière, N. Ollat, P.-F. Bert, De novo phased
assembly of the Vitis riparia grape genome. Sci. Data 6, 127 (2019). doi:10.1038/s41597-
019-0133-3 Medline
54. S. Ou, J. Chen, N. Jiang, Assessing genome assembly quality using the LTR Assembly Index
(LAI). Nucleic Acids Res. 46, e126 (2018). doi:10.1093/nar/gky730 Medline
55. A. M. Bolger, M. Lohse, B. Usadel, Trimmomatic: A flexible trimmer for Illumina sequence
data. Bioinformatics 30, 2114–2120 (2014). doi:10.1093/bioinformatics/btu170 Medline
56. D. Kim, J. M. Paggi, C. Park, C. Bennett, S. L. Salzberg, Graph-based genome alignment and
genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37, 907–915 (2019).
doi:10.1038/s41587-019-0201-4 Medline
57. R. Li, H. Zhu, J. Ruan, W. Qian, X. Fang, Z. Shi, Y. Li, S. Li, G. Shan, K. Kristiansen, S. Li,
H. Yang, J. Wang, J. Wang, De novo assembly of human genomes with massively
parallel short read sequencing. Genome Res. 20, 265–272 (2010).
doi:10.1101/gr.097261.109 Medline
58. G. Benson, Tandem repeats finder: A program to analyze DNA sequences. Nucleic Acids
Res. 27, 573–580 (1999). doi:10.1093/nar/27.2.573 Medline
59. D. Ellinghaus, S. Kurtz, U. Willhoeft, LTRharvest, an efficient and flexible software for de
novo detection of LTR retrotransposons. BMC Bioinformatics 9, 18–14 (2008).
doi:10.1186/1471-2105-9-18 Medline
60. S. Ou, N. Jiang, LTR_FINDER_parallel: Parallelization of LTR_FINDER enabling rapid
identification of long terminal repeat retrotransposons. Mob. DNA 10, 48–3 (2019).
doi:10.1186/s13100-019-0193-0 Medline
61. S. Ou, N. Jiang, LTR_retriever: A Highly Accurate and Sensitive Program for Identification
of Long Terminal Repeat Retrotransposons. Plant Physiol. 176, 1410–1422 (2018).
doi:10.1104/pp.17.01310 Medline
62. C. Camacho, G. Coulouris, V. Avagyan, N. Ma, J. Papadopoulos, K. Bealer, T. L. Madden,
BLAST+: Architecture and applications. BMC Bioinformatics 10, 421–429 (2009).
doi:10.1186/1471-2105-10-421 Medline
63. T. M. Lowe, S. R. Eddy, tRNAscan-SE: A program for improved detection of transfer RNA
genes in genomic sequence. Nucleic Acids Res. 25, 955–964 (1997).
doi:10.1093/nar/25.5.955 Medline
64. E. P. Nawrocki, S. R. Eddy, Query-dependent banding (QDB) for faster RNA similarity
searches. PLOS Comput. Biol. 3, e56 (2007). doi:10.1371/journal.pcbi.0030056 Medline
65. P. P. Gardner, J. Daub, J. G. Tate, E. P. Nawrocki, D. L. Kolbe, S. Lindgreen, A. C.
Wilkinson, R. D. Finn, S. Griffiths-Jones, S. R. Eddy, A. Bateman, Rfam: Updates to the
RNA families database. Nucleic Acids Res. 37, D136–D140 (2009).
doi:10.1093/nar/gkn766 Medline
66. M. Stanke, O. Keller, I. Gunduz, A. Hayes, S. Waack, B. Morgenstern, AUGUSTUS: Ab
initio prediction of alternative transcripts. Nucleic Acids Res. 34, W435–W439 (2006).
doi:10.1093/nar/gkl200 Medline

50
67. C. Burge, S. Karlin, Prediction of complete gene structures in human genomic DNA. J. Mol.
Biol. 268, 78–94 (1997). doi:10.1006/jmbi.1997.0951 Medline
68. W. H. Majoros, M. Pertea, S. L. Salzberg, TigrScan and GlimmerHMM: Two open source ab
initio eukaryotic gene-finders. Bioinformatics 20, 2878–2879 (2004).
doi:10.1093/bioinformatics/bth315 Medline
69. I. Korf, Gene finding in novel genomes. BMC Bioinformatics 5, 59–9 (2004).
doi:10.1186/1471-2105-5-59 Medline
70. E. M. Gertz, Y.-K. Yu, R. Agarwala, A. A. Schäffer, S. F. Altschul, Composition-based
statistics and translated nucleotide searches: Improving the TBLASTN module of
BLAST. BMC Biol. 4, 41–14 (2006). doi:10.1186/1741-7007-4-41 Medline
71. E. Birney, M. Clamp, R. Durbin, GeneWise and Genomewise. Genome Res. 14, 988–995
(2004). doi:10.1101/gr.1865504 Medline
72. M. Pertea, G. M. Pertea, C. M. Antonescu, T.-C. Chang, J. T. Mendell, S. L. Salzberg,
StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat.
Biotechnol. 33, 290–295 (2015). doi:10.1038/nbt.3122 Medline
73. B. J. Haas, S. L. Salzberg, W. Zhu, M. Pertea, J. E. Allen, J. Orvis, O. White, C. R. Buell, J.
R. Wortman, Automated eukaryotic gene structure annotation using EVidenceModeler
and the Program to Assemble Spliced Alignments. Genome Biol. 9, R7–R22 (2008).
doi:10.1186/gb-2008-9-1-r7 Medline
74. M. G. Grabherr, B. J. Haas, M. Yassour, J. Z. Levin, D. A. Thompson, I. Amit, X. Adiconis,
L. Fan, R. Raychowdhury, Q. Zeng, Z. Chen, E. Mauceli, N. Hacohen, A. Gnirke, N.
Rhind, F. di Palma, B. W. Birren, C. Nusbaum, K. Lindblad-Toh, N. Friedman, A.
Regev, Full-length transcriptome assembly from RNA-Seq data without a reference
genome. Nat. Biotechnol. 29, 644–652 (2011). doi:10.1038/nbt.1883 Medline
75. A. Bairoch, R. Apweiler, The SWISS-PROT protein sequence database and its supplement
TrEMBL in 2000. Nucleic Acids Res. 28, 45–48 (2000). doi:10.1093/nar/28.1.45 Medline
76. M. Kanehisa, Y. Sato, M. Kawashima, M. Furumichi, M. Tanabe, KEGG as a reference
resource for gene and protein annotation. Nucleic Acids Res. 44, D457–D462 (2016).
doi:10.1093/nar/gkv1070 Medline
77. S. F. Altschul, W. Gish, W. Miller, E. W. Myers, D. J. Lipman, Basic local alignment search
tool. J. Mol. Biol. 215, 403–410 (1990). doi:10.1016/S0022-2836(05)80360-2 Medline
78. P. Jones, D. Binns, H.-Y. Chang, M. Fraser, W. Li, C. McAnulla, H. McWilliam, J. Maslen,
A. Mitchell, G. Nuka, S. Pesseat, A. F. Quinn, A. Sangrador-Vegas, M. Scheremetjew,
S.-Y. Yong, R. Lopez, S. Hunter, InterProScan 5: Genome-scale protein function
classification. Bioinformatics 30, 1236–1240 (2014). doi:10.1093/bioinformatics/btu031
Medline
79. A. Lupas, M. Van Dyke, J. Stock, Predicting coiled coils from protein sequences. Science
252, 1162–1164 (1991). doi:10.1126/science.252.5009.1162 Medline
80. C. Yeats, M. Maibaum, R. Marsden, M. Dibley, D. Lee, S. Addou, C. A. Orengo, Gene3D:
Modelling protein structure, function and evolution. Nucleic Acids Res. 34, D281–D284
(2006). doi:10.1093/nar/gkj057 Medline

51
81. I. Pedruzzi, C. Rivoire, A. H. Auchincloss, E. Coudert, G. Keller, E. de Castro, D. Baratin, B.
A. Cuche, L. Bougueleret, S. Poux, N. Redaschi, I. Xenarios, A. Bridge, HAMAP in
2015: Updates to the protein family classification and annotation system. Nucleic Acids
Res. 43, D1064–D1070 (2015). doi:10.1093/nar/gku1002 Medline
82. P. D. Thomas, M. J. Campbell, A. Kejariwal, H. Mi, B. Karlak, R. Daverman, K. Diemer, A.
Muruganujan, A. Narechania, PANTHER: A library of protein families and subfamilies
indexed by function. Genome Res. 13, 2129–2141 (2003). doi:10.1101/gr.772403
Medline
83. R. D. Finn, A. Bateman, J. Clements, P. Coggill, R. Y. Eberhardt, S. R. Eddy, A. Heger, K.
Hetherington, L. Holm, J. Mistry, E. L. L. Sonnhammer, J. Tate, M. Punta, Pfam: The
protein families database. Nucleic Acids Res. 42, D222–D230 (2014).
doi:10.1093/nar/gkt1223 Medline
84. C. H. Wu, A. Nikolskaya, H. Huang, L.-S. L. Yeh, D. A. Natale, C. R. Vinayaka, Z.-Z. Hu,
R. Mazumder, S. Kumar, P. Kourtesis, R. S. Ledley, B. E. Suzek, L. Arminski, Y. Chen,
J. Zhang, J. L. Cardenas, S. Chung, J. Castro-Alvear, G. Dinkov, W. C. Barker, PIRSF:
Family classification system at the Protein Information Resource. Nucleic Acids Res. 32,
D112–D114 (2004). doi:10.1093/nar/gkh097 Medline
85. T. K. Attwood, M. J. Blythe, D. R. Flower, A. Gaulton, J. E. Mabey, N. Maudling, L.
McGregor, A. L. Mitchell, G. Moulton, K. Paine, P. Scordis, PRINTS and PRINTS-S
shed light on protein ancestry. Nucleic Acids Res. 30, 239–241 (2002).
doi:10.1093/nar/30.1.239 Medline
86. F. Servant, C. Bru, S. Carrère, E. Courcelle, J. Gouzy, D. Peyruc, D. Kahn, ProDom:
Automated clustering of homologous domains. Brief. Bioinform. 3, 246–251 (2002).
doi:10.1093/bib/3.3.246 Medline
87. C. J. A. Sigrist, E. de Castro, L. Cerutti, B. A. Cuche, N. Hulo, A. Bridge, L. Bougueleret, I.
Xenarios, New and continuing developments at PROSITE. Nucleic Acids Res. 41, D344–
D347 (2013). doi:10.1093/nar/gks1067 Medline
88. I. Letunic, T. Doerks, P. Bork, SMART 6: Recent updates and new developments. Nucleic
Acids Res. 37, D229–D232 (2009). doi:10.1093/nar/gkn808 Medline
89. D. Wilson, R. Pethica, Y. Zhou, C. Talbot, C. Vogel, M. Madera, C. Chothia, J. Gough,
SUPERFAMILY—Sophisticated comparative genomics, data mining, visualization and
phylogeny. Nucleic Acids Res. 37 (suppl_1), D380–D386 (2009).
doi:10.1093/nar/gkn762 Medline
90. D. H. Haft, B. J. Loftus, D. L. Richardson, F. Yang, J. A. Eisen, I. T. Paulsen, O. White,
TIGRFAMs: A protein family resource for the functional identification of proteins.
Nucleic Acids Res. 29, 41–43 (2001). doi:10.1093/nar/29.1.41 Medline
91. K. Margaryan, G. Melyan, F. Röckel, R. Töpfer, E. Maul, Genetic diversity of Armenian
grapevine (Vitis vinifera L.) germplasm: Molecular characterization and parentage
analysis. Biology 10, 1279 (2021). doi:10.3390/biology10121279 Medline
92. A. Ergül, G. Perez-Rivera, G. Söylemezoğlu, K. Kazan, R. Arroyo-Garcia, Genetic diversity
in Anatolian wild grapes (Vitis vinifera subsp. sylvestris) estimated by SSR markers.
Plant Genet. Resour. 9, 375–383 (2011). doi:10.1017/S1479262111000013

52
93. G. Zdunić, K. Lukšić, Z. A. Nagy, A. Mucalo, K. Hančević, T. Radić, L. Butorac, G. G.
Jahnke, E. Kiss, G. Ledesma-Krist, M. Regvar, M. Likar, A. Piltaver, M. Žulj Mihaljević,
E. Maletić, I. Pejić, M. Werling, E. Maul, Genetic structure and relationships among wild
and cultivated grapevines from central Europe and part of the western Balkan Peninsula.
Genes 11, 962 (2020). doi:10.3390/genes11090962 Medline
94. V. Laucou, T. Lacombe, F. Dechesne, R. Siret, J.-P. Bruno, M. Dessup, T. Dessup, P.
Ortigosa, P. Parra, C. Roux, S. Santoni, D. Varès, J.-P. Péros, J.-M. Boursiquot, P. This,
High throughput analysis of grape genetic diversity as a tool for germplasm collection
management. Theor. Appl. Genet. 122, 1233–1245 (2011). doi:10.1007/s00122-010-
1527-y Medline
95. R. Lózsa, N. Xia, T. Deák, G. D. Bisztray, Chloroplast diversity indicates two independent
maternal lineages in cultivated grapevine (Vitis vinifera L. subsp. vinifera). Genet.
Resour. Crop Evol. 62, 419–429 (2015). doi:10.1007/s10722-014-0169-3
96. J. Cunha, J. Ibáñez, M. Teixeira-Santos, J. Brazão, P. Fevereiro, J. M. Martínez-Zapater, J. E.
Eiras-Dias, Genetic Relationships Among Portuguese Cultivated and Wild Vitis vinifera
L. Germplasm. Front. Plant Sci. 11, 127 (2020). doi:10.3389/fpls.2020.00127 Medline
97. F. Grassi, F. D. Mattia, G. Zecca, F. Sala, M. Labra, Historical isolation and Quaternary
range expansion of divergent lineages in wild grapevine. Biol. J. Linn. Soc. Lond. 95,
611–619 (2008). doi:10.1111/j.1095-8312.2008.01081.x
98. S. Chen, Y. Zhou, Y. Chen, J. Gu, fastp: An ultra-fast all-in-one FASTQ preprocessor.
Bioinformatics 34, i884–i890 (2018). doi:10.1093/bioinformatics/bty560 Medline
99. P. Danecek, J. K. Bonfield, J. Liddle, J. Marshall, V. Ohan, M. O. Pollard, A. Whitwham, T.
Keane, S. A. McCarthy, R. M. Davies, H. Li, Twelve years of SAMtools and BCFtools.
Gigascience 10, giab008 (2021). doi:10.1093/gigascience/giab008 Medline
100. X. Dong, W. Chen, Z. Liang, X. Li, P. Nick, S. Chen, Y. Dong, S. Li, J. Sheng, VitisGDB:
The multifunctional database for grapevine breeding and genetics. Mol. Plant 13, 1098–
1100 (2020). doi:10.1016/j.molp.2020.05.002 Medline
101. G. A. Van der Auwera, M. O. Carneiro, C. Hartl, R. Poplin, G. Del Angel, A. Levy-
Moonshine, T. Jordan, K. Shakir, D. Roazen, J. Thibault, E. Banks, K. V. Garimella, D.
Altshuler, S. Gabriel, M. A. DePristo, From FastQ data to high confidence variant calls:
The Genome Analysis Toolkit best practices pipeline. Curr. Protoc. Bioinformatics 43,
11.10.1–11.10.33 (2013). Medline
102. P. Danecek, A. Auton, G. Abecasis, C. A. Albers, E. Banks, M. A. DePristo, R. E.
Handsaker, G. Lunter, G. T. Marth, S. T. Sherry, G. McVean, R. Durbin; 1000 Genomes
Project Analysis Group, The variant call format and VCFtools. Bioinformatics 27, 2156–
2158 (2011). doi:10.1093/bioinformatics/btr330 Medline
103. V. Laucou, A. Launay, R. Bacilieri, T. Lacombe, A.-F. Adam-Blondon, A. Bérard, A.
Chauveau, M. T. de Andrés, L. Hausmann, J. Ibáñez, M. C. Le Paslier, D. Maghradze, J.
M. Martinez-Zapater, E. Maul, M. Ponnaiah, R. Töpfer, J.-P. Péros, J.-M. Boursiquot,
Extended diversity analysis of cultivated grapevine Vitis vinifera with 10K genome-wide
SNPs. PLOS ONE 13, e0192540 (2018). doi:10.1371/journal.pone.0192540 Medline

53
104. Z. Zhang, S. Schwartz, L. Wagner, W. Miller, A greedy algorithm for aligning DNA
sequences. J. Comput. Biol. 7, 203–214 (2000). doi:10.1089/10665270050081478
Medline
105. D. Benjamin, T. Sato, K. Cibulskis, G. Getz, C. Stewart, L. Lichtenstein, Calling Somatic
SNVs and Indels with Mutect2. bioRxiv, 861054 (2019).
106. C. Plomion, J.-M. Aury, J. Amselem, T. Leroy, F. Murat, S. Duplessis, S. Faye, N.
Francillonne, K. Labadie, G. Le Provost, I. Lesur, J. Bartholomé, P. Faivre-Rampant, A.
Kohler, J.-C. Leplé, N. Chantret, J. Chen, A. Diévart, T. Alaeitabar, V. Barbe, C. Belser,
H. Bergès, C. Bodénès, M.-B. Bogeat-Triboulot, M.-L. Bouffaud, B. Brachi, E.
Chancerel, D. Cohen, A. Couloux, C. Da Silva, C. Dossat, F. Ehrenmann, C. Gaspin, J.
Grima-Pettenati, E. Guichoux, A. Hecker, S. Herrmann, P. Hugueney, I. Hummel, C.
Klopp, C. Lalanne, M. Lascoux, E. Lasserre, A. Lemainque, M.-L. Desprez-Loustau, I.
Luyten, M.-A. Madoui, S. Mangenot, C. Marchal, F. Maumus, J. Mercier, C. Michotey,
O. Panaud, N. Picault, N. Rouhier, O. Rué, C. Rustenholz, F. Salin, M. Soler, M. Tarkka,
A. Velt, A. E. Zanne, F. Martin, P. Wincker, H. Quesneville, A. Kremer, J. Salse, Oak
genome reveals facets of long lifespan. Nat. Plants 4, 440–452 (2018).
doi:10.1038/s41477-018-0172-3 Medline
107. K. Wang, M. Li, H. Hakonarson, ANNOVAR: Functional annotation of genetic variants
from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010).
doi:10.1093/nar/gkq603 Medline
108. Y. Choi, A. P. Chan, PROVEAN web server: A tool to predict the functional effect of
amino acid substitutions and indels. Bioinformatics 31, 2745–2747 (2015).
doi:10.1093/bioinformatics/btv195 Medline
109. E. D. O. Roberson, J. Pevsner, Visualization of shared genomic regions and meiotic
recombination in high-density SNP data. PLOS ONE 4, e6711 (2009).
doi:10.1371/journal.pone.0006711 Medline
110. R. K. Waples, A. Albrechtsen, I. Moltke, Allele frequency-free inference of close familial
relationships from genotypes or low-depth sequencing data. Mol. Ecol. 28, 35–48 (2019).
doi:10.1111/mec.14954 Medline
111. E. L. Stevens, G. Heckenberg, E. D. O. Roberson, J. D. Baugher, T. J. Downey, J. Pevsner,
Inference of relationships in population data using identity-by-descent and identity-by-
state. PLOS Genet. 7, e1002287 (2011). doi:10.1371/journal.pgen.1002287 Medline
112. T.-H. Lee, H. Guo, X. Wang, C. Kim, A. H. Paterson, SNPhylo: A pipeline to construct a
phylogenetic tree from huge SNP data. BMC Genomics 15, 162–162 (2014).
doi:10.1186/1471-2164-15-162 Medline
113. A. M. Kozlov, D. Darriba, T. Flouri, B. Morel, A. Stamatakis, RAxML-NG: A fast, scalable
and user-friendly tool for maximum likelihood phylogenetic inference. Bioinformatics
35, 4453–4455 (2019). doi:10.1093/bioinformatics/btz305 Medline
114. F. Lemoine, J. B. Domelevo Entfellner, E. Wilkinson, D. Correia, M. Dávila Felipe, T. De
Oliveira, O. Gascuel, Renewing Felsenstein’s phylogenetic bootstrap in the era of big
data. Nature 556, 452–456 (2018). doi:10.1038/s41586-018-0043-0 Medline

54
115. C. C. Chang, C. C. Chow, L. C. Tellier, S. Vattikuti, S. M. Purcell, J. J. Lee, Second-
generation PLINK: Rising to the challenge of larger and richer datasets. Gigascience 4, 7
(2015). doi:10.1186/s13742-015-0047-8 Medline
116. D. H. Huson, D. Bryant, Application of phylogenetic networks in evolutionary studies. Mol.
Biol. Evol. 23, 254–267 (2006). doi:10.1093/molbev/msj030 Medline
117. J. Yang, S. H. Lee, M. E. Goddard, P. M. Visscher, GCTA: A tool for genome-wide
complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011).
doi:10.1016/j.ajhg.2010.11.011 Medline
118. D. H. Alexander, J. Novembre, K. Lange, Fast model-based estimation of ancestry in
unrelated individuals. Genome Res. 19, 1655–1664 (2009). doi:10.1101/gr.094052.109
Medline
119. D. H. Alexander, K. Lange, Enhancements to the ADMIXTURE algorithm for individual
ancestry estimation. BMC Bioinformatics 12, 246 (2011). doi:10.1186/1471-2105-12-246
Medline
120. J. Gimbernat-Mayol, A. Dominguez Mantes, C. D. Bustamante, D. Mas Montserrat, A. G.
Ioannidis, Archetypal Analysis for population genetics. PLOS Comput. Biol. 18,
e1010301 (2022). doi:10.1371/journal.pcbi.1010301 Medline
121. C. Zhang, S.-S. Dong, J.-Y. Xu, W.-M. He, T.-L. Yang, PopLDdecay: A fast and effective
tool for linkage disequilibrium decay analysis based on variant call format files.
Bioinformatics 35, 1786–1788 (2019). doi:10.1093/bioinformatics/bty875 Medline
122. J. L. Brown, D. J. Hill, A. M. Dolan, A. C. Carnaval, A. M. Haywood, PaleoClim, high
spatial resolution paleoclimate surfaces for global land areas. Sci. Data 5, 180254 (2018).
doi:10.1038/sdata.2018.254 Medline
123. X. Feng, D. S. Park, Y. Liang, R. Pandey, M. Papeş, Collinearity in ecological niche
modeling: Confusions and challenges. Ecol. Evol. 9, 10365–10376 (2019).
doi:10.1002/ece3.5555 Medline
124. R. Muscarella, P. J. Galante, M. Soley‐Guardia, R. A. Boria, J. M. Kass, M. Uriarte, R. P.
Anderson, ENMeval: An R package for conducting spatially independent evaluations and
estimating optimal model complexity for Maxent ecological niche models. Methods Ecol.
Evol. 5, 1198–1205 (2014). doi:10.1111/2041-210X.12261
125. S. J. Phillips, R. P. Anderson, M. Dudík, R. E. Schapire, M. E. Blair, Opening the black
box: An open‐source release of Maxent. Ecography 40, 887–893 (2017).
doi:10.1111/ecog.03049
126. S. Schiffels, K. Wang, “MSMC and MSMC2: The Multiple Sequentially Markovian
Coalescent,” in Statistical Population Genomics, J. Y. Dutheil, Ed. (Humana, 2020), vol.
2090 of Methods in Molecular Biology, pp. 147–166.
127. O. Delaneau, J. Marchini, J.-F. Zagury, A linear complexity phasing method for thousands
of genomes. Nat. Methods 9, 179–181 (2011). doi:10.1038/nmeth.1785 Medline
128. X. Liu, Y.-X. Fu, Stairway Plot 2: Demographic history inference with folded SNP
frequency spectra. Genome Biol. 21, 280 (2020). doi:10.1186/s13059-020-02196-9
Medline

55
129. J. Kamm, J. Terhorst, R. Durbin, Y. S. Song, Efficiently inferring the demographic history
of many populations with allele count data. J. Am. Stat. Assoc. 115, 1472–1487 (2020).
Medline
130. J. K. Pickrell, J. K. Pritchard, Inference of population splits and mixtures from genome-
wide allele frequency data. PLOS Genet. 8, e1002967 (2012).
doi:10.1371/journal.pgen.1002967 Medline
131. R. R. Fitak, OptM: Estimating the optimal number of migration edges on population trees
using Treemix. Biol. Methods Protoc. 6, bpab017 (2021).
doi:10.1093/biomethods/bpab017 Medline
132. N. Patterson, P. Moorjani, Y. Luo, S. Mallick, N. Rohland, Y. Zhan, T. Genschoreck, T.
Webster, D. Reich, Ancient admixture in human history. Genetics 192, 1065–1093
(2012). doi:10.1534/genetics.112.145037 Medline
133. M. Malinsky, M. Matschiner, H. Svardal, Dsuite - Fast D-statistics and related admixture
evidence from VCF files. Mol. Ecol. Resour. 21, 584–595 (2021). doi:10.1111/1755-
0998.13265 Medline
134. P. Librado, J. Rozas, DnaSP v5: A software for comprehensive analysis of DNA
polymorphism data. Bioinformatics 25, 1451–1452 (2009).
doi:10.1093/bioinformatics/btp187 Medline
135. J. W. Leigh, D. Bryant, popart: Full‐feature software for haplotype network construction.
Methods Ecol. Evol. 6, 1110–1116 (2015). doi:10.1111/2041-210X.12410
136. L. Jiang, Z. Zheng, H. Fang, J. Yang, A generalized linear mixed model association tool for
biobank-scale data. Nat. Genet. 53, 1616–1621 (2021). doi:10.1038/s41588-021-00954-4
Medline
137. F. Emanuelli, S. Lorenzi, L. Grzeskowiak, V. Catalano, M. Stefanini, M. Troggio, S. Myles,
J. M. Martinez-Zapater, E. Zyprian, F. M. Moreira, M. S. Grando, Genetic diversity and
population structure assessed by SSR and SNP markers in a large germplasm collection
of grape. BMC Plant Biol. 13, 39 (2013). doi:10.1186/1471-2229-13-39 Medline
138. S. Léran, K. Varala, J.-C. Boyer, M. Chiurazzi, N. Crawford, F. Daniel-Vedele, L. David,
R. Dickstein, E. Fernandez, B. Forde, W. Gassmann, D. Geiger, A. Gojon, J.-M. Gong,
B. A. Halkier, J. M. Harris, R. Hedrich, A. M. Limami, D. Rentsch, M. Seo, Y.-F. Tsay,
M. Zhang, G. Coruzzi, B. Lacombe, A unified nomenclature of NITRATE
TRANSPORTER 1/PEPTIDE TRANSPORTER family members in plants. Trends Plant
Sci. 19, 5–9 (2014). doi:10.1016/j.tplants.2013.08.008 Medline
139. J.-F. Briat, K. Ravet, N. Arnaud, C. Duc, J. Boucherez, B. Touraine, F. Cellier, F. Gaymard,
New insights into ferritin synthesis and function highlight a link between iron
homeostasis and oxidative stress in plants. Ann. Bot. (Lond.) 105, 811–822 (2010).
doi:10.1093/aob/mcp128 Medline
140. L. Giacomelli, O. Rota-Stabelli, D. Masuero, A. K. Acheampong, M. Moretto, L. Caputi, U.
Vrhovsek, C. Moser, Gibberellin metabolism in Vitis vinifera L. during bloom and fruit-
set: Functional characterization and evolution of grapevine gibberellin oxidases. J. Exp.
Bot. 64, 4403–4419 (2013). doi:10.1093/jxb/ert251 Medline

56
141. A. Barkan, I. Small, Pentatricopeptide repeat proteins in plants. Annu. Rev. Plant Biol. 65,
415–442 (2014). doi:10.1146/annurev-arplant-050213-040159 Medline
142. H. Xing, X. Fu, C. Yang, X. Tang, L. Guo, C. Li, C. Xu, K. Luo, Genome-wide
investigation of pentatricopeptide repeat gene family in poplar and their expression
analysis in response to biotic and abiotic stresses. Sci. Rep. 8, 2817 (2018).
doi:10.1038/s41598-018-21269-1 Medline
143. E. Mazzucotelli, S. Belloni, D. Marone, A. De Leonardis, D. Guerra, N. Di Fonzo, L.
Cattivelli, A. Mastrangelo, The e3 ubiquitin ligase gene family in plants: Regulation by
degradation. Curr. Genomics 7, 509–522 (2006). doi:10.2174/138920206779315728
Medline
144. J. Jirschitzka, G. W. Schmidt, M. Reichelt, B. Schneider, J. Gershenzon, J. C. D’Auria,
Plant tropane alkaloid biosynthesis evolved independently in the Solanaceae and
Erythroxylaceae. Proc. Natl. Acad. Sci. U.S.A. 109, 10304–10309 (2012).
doi:10.1073/pnas.1200473109 Medline
145. M. A. Hicks, A. E. Barber 2nd, L. A. Giddings, J. Caldwell, S. E. O’Connor, P. C. Babbitt,
The evolution of function in strictosidine synthase-like proteins. Proteins 79, 3082–3098
(2011). doi:10.1002/prot.23135 Medline
146. J. Chong, M.-C. Piron, S. Meyer, D. Merdinoglu, C. Bertsch, P. Mestre, The SWEET
family of sugar transporters in grapevine: VvSWEET4 is involved in the interaction with
Botrytis cinerea. J. Exp. Bot. 65, 6589–6601 (2014). doi:10.1093/jxb/eru375 Medline
147. J. Ji, L. Yang, Z. Fang, Y. Zhang, M. Zhuang, H. Lv, Y. Wang, Plant SWEET family of
sugar transporters: Structure, evolution and biological functions. Biomolecules 12, 205
(2022). doi:10.3390/biom12020205 Medline
148. N. Dudareva, J. C. D’Auria, K. H. Nam, R. A. Raguso, E. Pichersky, Acetyl-
CoA:benzylalcohol acetyltransferase—An enzyme involved in floral scent production in
Clarkia breweri. Plant J. 14, 297–304 (1998). doi:10.1046/j.1365-313X.1998.00121.x
Medline
149. F. Bao, A. Ding, T. Zhang, L. Luo, J. Wang, T. Cheng, Q. Zhang, Expansion of PmBEAT
genes in the Prunus mume genome induces characteristic floral scent production. Hortic.
Res. 6, 24 (2019). doi:10.1038/s41438-018-0104-4 Medline
150. M. C. Peppi, M. A. Walker, M. W. Fidelibus, Application of abscisic acid rapidly
upregulated UFGT gene expression and improved color of grape berries. Vitis 47, 11–14
(2008).
151. B. D. Kohorn, S. L. Kohorn, The cell wall-associated kinases, WAKs, as pectin receptors.
Front. Plant Sci. 3, 88 (2012). doi:10.3389/fpls.2012.00088 Medline
152. D. Q. Fuller, C. J. Stevens, Between domestication and civilization: The role of agriculture
and arboriculture in the emergence of the first urban societies. Veg. Hist. Archaeobot. 28,
263–282 (2019). doi:10.1007/s00334-019-00727-4 Medline
153. S. Abbo, A. Gopher, S. Lev‐Yadun, “Fruit domestication in the Western Asia,” in Plant
Breeding Reviews, J. Janick, Ed. (Wiley, 2016), vol. 39, pp. 325–378.

57

You might also like