You are on page 1of 18

LETTER OPEN

doi:10.1038/nature14668

The octopus genome and the evolution of cephalopod
neural and morphological novelties
Caroline B. Albertin1*, Oleg Simakov2,3*, Therese Mitros4, Z. Yan Wang5, Judit R. Pungor5, Eric Edsinger-Gonzales2,4,
Sydney Brenner2, Clifton W. Ragsdale1,5 & Daniel S. Rokhsar2,4,6

Coleoid cephalopods (octopus, squid and cuttlefish) are active, 97% of expressed protein-coding genes and 83% of the estimated
resourceful predators with a rich behavioural repertoire1. They 2.7 gigabase (Gb) genome size (Methods and Supplementary Notes
have the largest nervous systems among the invertebrates2 and 1–3). The unassembled fraction is dominated by high-copy repetitive
present other striking morphological innovations including cam- sequences (Supplementary Note 1). Nearly 45% of the assembled gen-
era-like eyes, prehensile arms, a highly derived early embryogenesis ome is composed of repetitive elements, with two bursts of transposon
and a remarkably sophisticated adaptive colouration system1,3. To activity occurring ,25-million and ,56-million years ago (Mya)
investigate the molecular bases of cephalopod brain and body (Supplementary Note 4).
innovations, we sequenced the genome and multiple transcrip- We predicted 33,638 protein-coding genes (Methods and Supple-
tomes of the California two-spot octopus, Octopus bimaculoides. mentary Note 4) and found alternate splicing at 2,819 loci, but no locus
We found no evidence for hypothesized whole-genome duplica- showed an unusually high number of splice variants (Supplementary
tions in the octopus lineage4–6. The core developmental and neur- Note 4). A-to-G discrepancies between the assembled genome and
onal gene repertoire of the octopus is broadly similar to that found transcriptome sequences provided evidence for extensive mRNA edit-
across invertebrate bilaterians, except for massive expansions in ing by adenosine deaminases acting on RNA (ADARs). Many candid-
two gene families previously thought to be uniquely enlarged in ate edits are enriched in neural tissues7 and are found in a range of gene
vertebrates: the protocadherins, which regulate neuronal develop- families, including ‘housekeeping’ genes such as the tubulins, which
ment, and the C2H2 superfamily of zinc-finger transcription fac- suggests that RNA edits are more widespread than previously appre-
tors. Extensive messenger RNA editing generates transcript and ciated (Extended Data Fig. 1 and Supplementary Note 5).
protein diversity in genes involved in neural excitability, as prev- Based primarily on chromosome number, several researchers pro-
iously described7, as well as in genes participating in a broad range posed that whole-genome duplications were important in the evolu-
of other cellular functions. We identified hundreds of cephalopod- tion of the cephalopod body plan4–6, paralleling the role ascribed to the
specific genes, many of which showed elevated expression levels in independent whole-genome duplication events that occurred early in
such specialized structures as the skin, the suckers and the nervous vertebrate evolution11. Although this is an attractive framework for
system. Finally, we found evidence for large-scale genomic rear- both gene family expansion and increased regulatory complexity
rangements that are closely associated with transposable element across multiple genes, we found no evidence for it. The gene family
expansions. Our analysis suggests that substantial expansion of a expansions present in octopus are predominantly organized in
handful of gene families, along with extensive remodelling of gen- clusters along the genome, rather than distributed in doubly conserved
ome linkage and repetitive content, played a critical role in the synteny as expected for a paleopolyploid12,13 (Supplementary Note 6.2).
evolution of cephalopod morphological innovations, including Although genes that regulate development are often retained in multiple
their large and complex nervous systems. copies after paleopolyploidy in other lineages, they are not generally
Soft-bodied cephalopods such as the octopus (Fig. 1a) show remark- expanded in octopus relative to limpet, oyster and other invertebrate
able morphological departures from the basic molluscan body plan, bilaterians11,14 (Table 1 and Supplementary Notes 7.4 and 8).
including dexterous arms lined with hundreds of suckers that function Hox genes are commonly retained in multiple copies following
as specialized tactile and chemosensory organs, and an elaborate chro- whole-genome duplication15. In O. bimaculoides, however, we found
matophore system under direct neural control that enables rapid only a single Hox complement, consistent with the single set of Hox
changes in appearance1,8. The octopus nervous system is vastly modi- transcripts identified in the bobtail squid Euprymna scolopes with
fied in size and organization relative to other molluscs, comprising a PCR16. Remarkably, octopus Hox genes are not organized into clusters
circumesophageal brain, paired optic lobes and axial nerve cords in as in most other bilaterian genomes15, but are completely atomized
each arm2,3. Together these structures contain nearly half a billion (Extended Data Fig. 2 and Supplementary Note 9). Although we can-
neurons, more than six times the number in a mouse brain2,9. Extant not rule out whole-genome duplication followed by considerable gene
coleoid cephalopods show extraordinarily sophisticated behaviours loss, the extent of loss needed to support this claim would far exceed
including complex problem solving, task-dependent conditional dis- that which has been observed in other paleopolyploid lineages, and it is
crimination, observational learning and spectacular displays of cam- more plausible that chromosome number in coleoids increased by
ouflage1,10 (Supplementary Videos 1 and 2). chromosome fragmentation.
To explore the genetic features of these highly specialized animals, Mechanisms other than whole-genome duplications can drive
we sequenced the Octopus bimaculoides genome by a whole-genome genomic novelty, including expansion of existing gene families, evolu-
shotgun approach (Supplementary Note 1) and annotated it using tion of novel genes, modification of gene regulatory networks, and
extensive transcriptome sequence from 12 tissues (Methods and reorganization of the genome through transposon activity. Within
Supplementary Note 2). The genome assembly captures more than the O. bimaculoides genome, we found evidence for all of these
1
Department of Organismal Biology and Anatomy, University of Chicago, Chicago, Illinois 60637, USA. 2Okinawa Institute of Science and Technology Graduate University, Onna, Okinawa 9040495, Japan.
3
Centre for Organismal Studies, University of Heidelberg, 69117 Heidelberg, Germany. 4Department of Molecular and Cell Biology, University of California, Berkeley, California 94720, USA. 5Department of
Neurobiology, University of Chicago, Chicago, Illinois 60637, USA. 6Department of Energy Joint Genome Institute, Walnut Creek, California 94598, USA.
*These authors contributed equally to this work.

2 2 0 | N AT U R E | VO L 5 2 4 | 1 3 AU G U S T 2 0 1 5
G2015 Macmillan Publishers Limited. All rights reserved

338 764 herin repertoire. Lottia gigantea. Obi. after shuffling. rather than long-range connections. Obi. however multiple domains in a given gene were counted brain (Sub). b. Dendrogram above species names reflects their evolutionary relationships. which is generated by complex splicing from a clus. Helobdella robusta. Obi Lgi Cte Dme Cel Bfl Hsa omes. as well as synaptic Wnt 12 10 12 7 5 17 19 specificity18. Homo sapiens. including expansions in several gene families. Protocadherin diversity provides a mechanism for regulating evolution of the cephalopod body plan did not require extreme expan. The importance of local neuropil interac- animal genomes did identify several notable gene family expansions tions. highlighting the tissues (light blue). maroon. subesophageal in the same gene. and extensive genome octopus protocadherins appear to have expanded . Capitella teleta. Xtr. Drosophila melanogaster. Crassostrea gigas (oyster) and Capitella gen. the polychaete annelid Capitella teleta17 and the cadherins in zebrafish and mammals21. Lottia gigantea. Pinctada fucata. suggesting that the brates. Schematic of Octopus bimaculoides anatomy.135 Mya.01 against the outgroup hepatopancreas). skin. peach. Mmu. Hro. In contrast. light pink. supraesophageal brain (Supra). optic average. grey. consistent with a central role for these genes in establishing and main- we found a fairly standard set of developmentally important trans. Surprisingly. Gallus gallus. families are expanded in octopus. O. Cgi. Some Pfams (for example. Caenorhabditis elegans. red. G-protein. is probably due to the limits in octopus. All rights reserved . For a domain to be labelled as expanded in a group. as (C2H2 ZNFs). The octopus genome encodes 168 multi-exonic protocadherin Table 1 | Metazoan developmental control genes genes. expanded number of protocadherin genes (Supplementary Note Branchiostoma floridae. Homeodomain 114 121 111 104 99 133 333 High mobility group 23 15 14 13 16 51 125 tered locus rather than tandem gene duplication (reviewed in ref. 2) is and Extended Data Fig. Vertebrates also show a remarkable expansion of the protocad. Branchiostoma floridae. C2H2 and protocadherin domain-containing gene Lch. Crassostrea gigas. Dre. Helix loop helix 50 63 64 59 42 78 118 Thus both octopuses and vertebrates have independently evolved a Nuclear hormone receptor 40 44 45 16 274 33 48 diverse array of protocadherin genes. C2H2 zinc-finger 1. Delta/Jagged 4 1 1 2 4 2 7 Hedgehog 1 1 1 1 0 1 3 centrotus purpuratus (sea urchin). bimaculoides a. Latimeria chalumnae.790 413 222 326 211 1. Fox 16 28 26 17 18 42 43 A search of available transcriptome data from the longfin inshore Tbox 9 9 7 8 21 9 18 squid Doryteuthis (formerly. Homo sapiens. 19). thick axons are then required for rapid high-fidelity signal conduction coupled receptors (GPCRs). retina. Pfu. stage 15 (St15) embryo. either due to more structure. 8. mechanisms. Protocadherins are homophilic cell adhesion molecules whose Ligands function has been primarily studied in mammals. 3). Cadherin and Cadherin_2) may occur lobe (OL). Single protocadherin genes are found in the invertebrate TGFb/BMP 12 9 14 6 5 22 33 deuterostomes Saccoglossus kowalevskii (acorn worm) and Strongylo. bimaculoides. Cte. 2b). domain architecture and exon–intron cadherins are much more similar in sequence. C2H2 zinc-finger proteins placed on axon density and connectivity by the absence of myelin. octopuses diverged from squid. the short-range interactions needed for the assembly of local neural sions of these ‘toolkit’ genes (Table 1 and Supplementary Note 8.3). Dme. Relative to these invertebrate bilaterians. (ANC). Dme. Xenopus tropicalis. Danio rerio. clustered octopus proto- In gene family content. cephalochordate Branchiostoma floridae14 (Supplementary Note 7 The expression of protocadherins in octopus neural tissues (Fig. Unlinked of octopus. including O. bimaculoides and selected other taxa. at least 50% of sampled for transcriptome analysis: viscera (heart. orange. posterior salivary gland (PSG). taining cephalopod nervous system organization as they do in verte- cription factors and signalling pathway genes. Bfl. over long distances. a suite and octopus protocadherin arrays arose independently. shown in light blue. Mus musculus. gonads (ova or testes). suckers. Cel. Loligo) pealeii20 also demonstrated an Number of members of developmental ligand and transcription factor families from O. Cel. Lgi. purple. chitinases and sialins (Figs 1b. O. aquamarine. Cte. axial nerve cord only once. circuits18. a striking expansion relative to the 17–25 genes found in Lottia. 1 3 AU G U S T 2 0 1 5 | VO L 5 2 4 | N AT U R E | 2 2 1 G2015 Macmillan Publishers Limited. our phylogenetic analyses suggest that the squid Hsa. Enriched Pfam domains were identified in bimaculoides. The sequence divergence between octopus and Extended Data Figs 4–6 and Supplementary Notes 8 and 10). nearly three-quarters of which are found in tandem clusters on the genome (Fig. the octopus genome broadly resembles that of the limpet recent duplications or gene conversion as found in clustered proto- Lottia gigantea17. Bfl. statistical analysis of protein domain distributions across nervous system appears2. yellow. LETTER RESEARCH a b PF08266/00028 Cadherin PF05375 Pacifastin_I PF02868/01447 Peptidase_M4 PF02037 SAP PF06083 IL17 PF00002 7tm_2 PF07690 MFS_1 PF14830 Haemocyan_bet_s PF13465/00096 zf-C2H2 PF05970 PIF1 PF00264 Tyrosinase PF00582 Usp PF00147 Fibrinogen_C PF00024 PAN_1 PF01582 TIR PF00092 VWA PF01531 Glyco_transf_11 PF02931 Neur_chan_LBD PF01607 CBM_14 PF05485 THAP Mmu Lch Bfl Obi Cgi Hro Lgi Pfu Dme Cte Gga Hsa Cel Xtr Dre –2 0 2 Row Z-score Figure 1 | Octopus anatomy and gene family representation analysis. Hsa. Lgi. which is where the greatest complexity in the cephalopod However.and cephalopod-specific genes. Skin sampled for transcriptome analysis included the eyespot. Capitella teleta.2). Gga. indicating that their absence in Axon guidance 10 9 9 6 8 23 33 Drosophila melanogaster and Caenorhabditis elegans is due to gene Transcription factors loss. interleukin-17-like genes (IL17-like). bright pink. including protocadherins. mottled brown. Caenorhabditis elegans. where they are Fibroblast growth factor 3 2 1 3 3 8 22 required for neuronal development and survival. kidney and its associated gene families need a corrected P value of 0. Drosophila melanogaster. lophotrochozoans (green) and molluscs (yellow). 2 and 3.

20 0. tandem C2H2 Zinc finger. 0. Cte (green).10 0. a.1 r1 77 Nv p hi 14 60 73 e _2 dh 02 lik W 72 CD NvC D -N D te _2 15E4 53 e Am _1 iC oR Lg i_ _X ELSR r6 m NvC D H 4 C te s_ ph 88 C R Lg vF A 22 DH 1 e_ 26 L2 n 14 C ni A i_ 23 24 _0 C 0_ 16 T. 7 _17 _PC 19_ Lgi 039 DH. Scaffold 19852 C2H2 genes 0. octopus not detected in the tissues sampled.15 4 -2 de sm oc ol lin Hs 95 Hs Ct e2 87 H- _1 49 _a CD H-13 Hs _1 oc ol lin Hs a_ 13 CD 26 lin _2 14 Hs -4 oc CD a_ PC 28 85 H- 30 Hs -4 n_ 00 ol a_ Ct sm Hs Hs H. human protocadherin expansion (58 standard deviations from the mean expression level.00 5 5 5 5 25 5 5 02 17 32 47 77 92 6 0 0. Nematostella vectensis (mustard yellow). Expression profiles of 161 queenslandica (yellow). non-tandem 19852 in 12 octopus transcriptomes. Distribution of fourfold synonymous site 0. a. c. 0. b. e_ ph 24 04 2 n 28 L2 A m _2 17 49 gu 20 Hs oR ph ot 08 dh 4 Lg pp iC 36 2 9 . Protocadherins have high genes). II. 7 protocadherins were kowalevskii (purple).1 Cd hr4 _CD H-18 3 Hsa H -19_ fla a_ _CD e_s Lgi mi -18_ _1 -11 Hsa_C Cd hr1 229 075 Lgi_ 86 a_ Hs 004 R226_ LSR 3 Lgi_ Fla H_EG _CD 696 ng Lgi_ hr2 45 _16 a_CD _CD 3 oRR tar _A Lgi_ _22 _1 NvC H-1 _C 2 Lg i_2 3DKX mi o_ 2b oR phiCE .1 Hsa Hs 955 -92 7 . see Extended Data Fig. 11. _ZU _PC gam 1 OWJ Hsa A8_ oG8 6AT QV DH_ ma.05 0.025 domain-containing genes.03 c.1 _D nig o_ DH Am 259 a_C E-CD 1 319 H_ F_LA sc R H- _D a_N 1_ G_7p -8_1 -2 W6 Am ht _5 3 Hs a_CD H. 0. and Saccoglossus protocadherins and 19 cadherins in 12 octopus tissues. dachsous. Lgi_ p Hsa_ DH_ 10_1 867 alph 237 PCD a-C2 Lgi_ 75 Hsa_ H_al _1 1658 PCD pha. 0. Asterisk denotes a novel cadherin with over 80 extracellular pattern.lik 5 te H te P VI II C yn C C 17 15 ls i_ 65 N _2 d hr oR ca C m 16 Lg 2_ 70 te R10 i_ 18 2 A 86 _C Hs m 10 ph 3_ 23 20 20 T095 Hs a_PC iF C33 44 87 _1 74 i_ Lg te _2 058_ 21 1 -1 Hs a_PC DH A T. 1718 PCD 5_1 Lgi_ 21 Hsa_ H_alp 1718 PCDH ha-6 Lgi_ 2 _1 3279 Hsa_PCDH _alpha-8_1 Lgi_2 hr20 Hsa_P _alph hiCd a-11_ Amp CDH_ 1 01617 Hsa_P alpha Cte_2 CDH_alpha--10_1 DHR_3 Hsa_P 12_1 Hsa_C CDH_a 6564 Hsa_PC lpha-1 Lgi_15 DH_alp _1 6560 Hsa_PC ha-2_1 Lgi_15 DH_alpha-13_1 _C3 oRR180 Protocadherins Hsa_PCD H_alpha-3_1 oRR277_C2DF C2DX Hsa_PCD -8_T060_ H_alpha-4_1 opRR526 Hsa_PCDH 5K _alpha-7_1 oRR836_C Hsa_PCDH_ Lgi_232259 alpha-9_2 Hsa_PCDH_alp Lgi_160343 ha-9_1 Hsa_PCDH_beta.2 EAX0494 H sa _ C AmphiCD Hsa_ Cte_ 1927 r5 N v C dh r 5 Hsa_ Hsa_ 64541 NvPCDH NvDCHS NvCdhr4 12_I 54- Hsa _CD DH-18 NvCdhr2 Lgi_ 2340 18 NvCdhr1 G2703 8 _ C2 8 X 38- iCdh Hsa A mphiC 28 Hs NvFAT 93 1 DH-24 r3 883 -4_C p CDH -19_12 165 92 _CDH 54.1 og le in DH ei Ct Hs e_ a_ D m R586 iC dh 16_1 prot og Ct de Hs sm e_ a_ _2 le 21 Fa Hs Hs a_ de e_ ph iF 2 Hs a_ de sm a_ 17 Am 6V0I fa t t4 Dm 22 Hs Hs a_ de oR 83 Hs a_CD sm 2 a_ _Q 58 Hs 57 AT Ct i_ 10 32_W 54 R5 Am a_CD H- te e_ sh _CX3 r1 3 oR 13 N vC H2 3 21 e_ Lg _C T_ Hs 36 R2 E6 a_ 22 51 63 _C 7. Cells are coloured according to number of protocadherin expansion (168 genes).25 0.like_ in -2 a_ . non-tandem Clustered expression for a majority of these C2H2 genes. For a larger version of contain the two largest clusters of protocadherins. Scaffold 30672 and Scaffold 9600 aa Scaffold 19852 Figure 3 | C2H2 ZNF expansion in octopus. respectively. Lgi_ H_al C1_1 22 Hsa_ pha.035 In a and b. Dme (orange). oC826_C2Y_IZ ta-9 Hsa_PCDH_be oRR894_C2DY_IZ beta-14 oT039_C3X Hsa_PCDH_ 2_IZ _beta-11 oC832_C3 Hsa_PCDH _IZ 2 H_beta-1 oE097_oT898_C3 Hsa_PCD a-C5_1 _IZ H_gamm oE093_C Hsa_PCD 1 6TDS_IZ ma-C4_ opE100 UOJV DH_gam 3_1 _C3DR_ Hsa_PC UOWV DH_gamma-C opC829 _C3DR Hsa_PC amma-B2_1 opC82 5_C3D _ZUOW V CDH_g 1 R_UOW Hsa_P a-B1_ opRR5 V CDH_gamm a-B7_1 78_C3 Hsa_P gamm opC828_C3 DR_UOQV CDH_ 1 Hsa_P a-B6_ oE096_C6A DRS_UOWV gamm 5_1 CDH_ ma-B oRR5 _ZUO Hsa_P _gam 1 76_C WV PCDH -B4_ oT09 2_Z Hsa_ mma _1 6_C3 H_ga -A11 oRR _Z PCD mma 026_ Hsa_ H_ga _1 C3A_Z PCD -A12 oG80 Hsa_ mma _1 8_C3 H_ga A10 oG8 SX_U 10-8 Hsa_ PCD gam ma- A9_ 1 oT1 13_C OWJQ DH_ ma. 164 590 CDH Hs CDH-8 Hsa CDH 9-9 a_fla mi ng NvCdh CDH -22 1645 Hsa DH-1 1 _P_C H-1 2 Dm Cte _18 1645 Lgi _22 555 Lgi_1 Amph Hs a_CD -4_ AB228 _oT a_ CDH_ 4 ph iCd _23_AA 2. arrow denotes scaffold orientation.1 Lg 71 8 ph H.LE 66 56 in a_ 4 C R en in -3 DH _F lik oR i_ 11 81 nt Am PCDH _F at sy nt en _2 e Lg te _1 al -1 ph _F at _c sy TN in Lg iC at _3 C sa _c al LS nt en Am i_ 15 dh _1 H iC sy Ct ph 42 r1 H sa ph C al oR e_ iC 08 2 Am e_ N 51 Dm R8 22 dh D m CLST 04 e_ 99 84 r8 00 Ct CDH_ 12 _C 43 Nv e_ 21 00 e_ X Ct 21 66 53 e_ 02 oR Lg i_2 21 55 87A Ct e_ 22 00 8 R6 53 Ct e_ 21 24 Am 81 29 24 ph -2 9 Ct i_2 29 91 4 Am iC _C 13 Lg 59 91 6 4 dh i_1 Am ph iC r1 Lg i_1 59 dh r1 Ct ph dh 6 Lg ph iC 23 6 e_ iC dh r7 1. I.005 0.04 c finger. with the exception of a group of sucker-specific cadherins. VI. ANC Ova Testes Viscera PSG Suckers Skin St15 Retina OL Supra Sub 0 0. Genomic organization of the largest C2H2 cluster. V. CELSR. Phylogenetic tree of cadherin genes in Hsa Clustered protocadherins vary greatly in genomic span and are oriented in a (red). VIII.015 0.8 1 4DTv4DTv distance –3 –2 –1 0 1 2 3 Row Z-score 2 2 2 | N AT U R E | VO L 5 2 4 | 1 3 AU G U S T 2 0 1 5 G2015 Macmillan Publishers Limited.01 0.1 Lg 19 r6 Am i_2 30 97 2 40 i_2 91 71 Lg i_2 34 27 31 oR Lg i_1 29 24 Lg 00 Hs R8 54 7 XP 8 a_d 88 54 h_ 82 Ct _C s_ 71 1 ac e_ 12 4 i_1 87 hs 13 D3 Lg i_2 37 0 ou s_1 Ct e_ V 95 Lg 87 _E 13 22 37 9 AW 96 i_2 86 Hs 09 Lg 37 6 a_P 6868 i_2 82 Am CD 3.1 _23 8 phi DH Lgi 788 PC 2 _16 6 Hsa DH Lgi 767 _PC 1 _23 Hsa DH Lgi 9p _P_ -12 438 Hsa PCD _16 3 _PC H-1 Lgi 562 Hsa DH.15 transversion distances (4DTv) between C2H2- Fraction frequency 0. Type II classical expression in neural tissues. RESEARCH LETTER a c * Hsa_dac IX hsous_2_CRA_a_ Hsa_CD H-8 U 80_ VIII oR R 7 3 8 -7 3 9_C Hsa_ P_ PCDH-15 oRR023_ C14_ Z T70 A mphiC dhr1 0 A mphiC dhr1 1 I oR A mphiC dhr9 Hsa_C C te _ 1 8 1 0 6 0 AmphiDCHS Hsa_P_ Amphi Cdhr2 Lgi_174335 NvHedgling Hsa_P dh r1 Amph iCdhr3 9_o R1 Hsa_ 4.02 0. Cadherins generally show a similar expression cadherins. 92 1_ Y-lin 30 65 oF R8 8_C6 TP _ZUO UO JQ 19 27 oR 98 26_C DT SF W V H-1 1_ i_ 12 i5 oR R71 5_C 6S SF D_IZ JQ QV W CD H-1 Lg Lg i8 20 51 R1 2_ 6T P_ _U W V 94 oR R90 13 24 34 _P CD Lg 20 26 C MF_ ZU OW JQ oF 67 90 Hsa _P 90 27 _C 5_ JQ V 97 1_C 6T BN i_ O oT 982_ C 6TSE IZU WJQ OW V Lg 13 6S UO IZU WJV V 0_ 7_ 15 sa 7_ Lg oF 97 9_C i_ H i_ i6 C F_ W O U Lg 6M SF oF 81 IZ V W Lg i5 20 7X2 Q 45 8_C W oC SF X_ OW WJQ V 6M SX_Z X_IZ OW V V Lg Lg i5 16 62 C JQ 5_ oR Lg oE 67 53_C MSF C X2 ZU JQ Lg i_ 10 16 92 V _U OW V V 81 oT R9 C6 ZW 6S 83 6_ C6_ O oR 817_ C2 Lg i_ 10 46 32 R Lg W JQ i7 C6D IZ SN UO oC 6_ C2X_ C2X U UO 5_ JQ V Lg 10 51 90 oI V S_ UO _Z WJQ 48 15 10 31 5_ oI RR88 _C oR 46_C 10 51 26 U 31 op R314 6M _Z 12 oR 8_ C4DT V JQ i5 oI 657_ i_ oR oRR8 _C ZU _I IZ oL R827 53 7_ZV oN 1_ 31 F_ W Z 42 7_ 6_ oI 818_ ZU JQ 7_ _C6_ 6_ i_ 85 C4TP X_IZ UW W oC 32 Lg _C V C5 _IZ ZU oH 3X OW Q oR D_ SF_IZ JQ W R7 oA UOW JQ _IZ C3 SF_Z IZUO WJQ V 85 R2 oR BF UO IZ R7 1_ QV V oR C6 oR 4_ _IZ oA TSGF _ZUO ZUO R2 C6 oR 63 C6 OW V R2 oH X_ 76 1_C4 X_ZU WQ _C F_UO QV R7 oR TS 77 _C 843_C 4_C5 WV 84 oH 48 6T 6D UO U 60 UO _Z F_ R6 C6X_ _C 85 oR 6H 848_CUOW 9_ _Z _C oH 84 FX _C W TF 5_Z 6H W 82 oR R5 C6 TS oA839 _C 5_ZU 55 JQ 7_ _Z oL484 83_C6 FX2 852 6H JV C6 _C5_Z5_ZUO _C TS FX R4 71_C GFX_ UOWJ oR FX HT 0_ R8 _Z W UO V oRR73 _C6TSF ZUWJQV TS WQV V ST 4_ _C4_V E_ 6K _C 2_C6 TS 83 oRR _Z 47_ UO oR FX_ ZUOW R6 5X B_ 84 411 _C oRR FX ZO V oA 6S TG oRR 3_C6 OW _C 6S III 53_C6K oD438_ OWQV IZU oA SDF ZUOW IZU oRR C6 oA 846 _IZ ZU oP0 521 oA WJ 39_C4T QV G_ _Z 6_Z SFX4_ oRR 175 1_C _ZV JQ oRR 840 TSX_I 84 WQ 569 UO Z OW C5_ 44 oRR UO 55_ 42_C4_ oA C4_Z QV ZO UOW 519 oRR 93_C6T T_ZUOW C3_ WQV oRR172_ 5TS_IZUO _C6 U WV 6TS oA DT oD oRR5 WJ X_Z 218_C6 _IUWJQV JQV ZU opA JQV _C6 oRR1 oD W V 727_ Q 85 QV oM05 WJQV _C6 oM06 799_ WJQV oN658 oN659_C6T_I IZUOWJQV 550_ 117_ _ZUO QV OW TS_ oA oT078 ZUOWJQV oD437_C oRR837_ X_Z C6_ZUOW oRR027_ 6TSF_ZO opRR161_C5_ UJQV oRR783 oA826_C6TS_ZUOWJ C5T 6TF_ZUOWJQV TSF opT761_C3_OWJ oA836_C UOW opB320_C6TS_ZO JQV oA8 V oA819_C6SX_IZUOWJQ oA837_C6TSX_ZUOWV oA838_C6TS_IZOWV TS_ 49_C C6T _ZO opT761_C2DT oA811_C6TSF_ZUWJQV oRR730_C6TX_ZUOWJQV oA816_C6HTBFDX_ZUOWJQV oM065_C6TFX_IZUOWJQ _IZU JQ opT 2 3 5 -1 2 _ C5 _ U0 JQ oA835_ C6_ ZUO 69_C6TS_ OWJQV oA822_ C6T_ ZUOWJQV oRR261_ C6TX_ IZUOWJQ JV oA821_ C6TD_ ZUOWJQV oRR804_ C6_ IZUOWV C6B S_ZUOWJQ 5_C6T_IZU IZUOWJQV _IZ TSF oRR C6TF_ZU JQV 1_C6T _ZUO _ZU QV C6T oD4 C6T_ C6SF _IZUO IZW ZUO _C6HT opT 2 3 5 -1 4 _ C5 _ U0 5TF_UOW WJ _C5_Z OW AS_Z OW X_IZ oA8 OW _C6T_Z 40_ QV QV IZOW JQV WJQ 6TX_ZOWJQV KDF_ JQV _C6T_ 7_C6 oA832_ _C6TX UOW 8_C6 JQV oD4 S_IZU ZUOW JQV U OWJ V IZUOW oA820_C6T oA8 oA82 opA824_C oA82 oA810_C WJQ S_OWJQV oP088 QV oA833 WJQV JQV WV V JQ b Scaffold 30672 Cadherins 100 kb 20 kb Ova Testes Viscera PSG Suckers Skin St15 Retina OL Supra Sub ANC Scaffold 9600 20 kb –3 –2 –1 0 1 2 3 Row Z-score Figure 2 | Protocadherin expansion in octopus. fat-like. Neural and 0. 0. 1 12_C NF_ _PC gam A5_ oJ0 6AT ZUO Hsa DH_ 52_ WJQ _PC ma- gam a-A2_1 oG8 C6T NSF_ZU V Hsa DH_ 09_ DNF OW _PC mm 1_1 oJ0 C6T _ZU JQV Hsa _ga a-A 53_ DNF OW DH mm 7_1 C3_ _ZU JQV _PC _ga a-A oG8 Z Hsa DH 15_ OW _PC mm 6_1 opR C6T JQV Hsa _ga a-A R54 DN DH mm _1 oB333 4_C F_Z _PC _ga ma-A4 3 UO Hsa DH oJ0 _C6 5TDNFX WJV _PC H_gam _22 572 7 51_ TN _UO Hsa CD 241 oF9 C3 F_I ZU Cte _22 25 9 oB 73_C3 DA OW WJQV Hsa_P Cte 82 323 _Z JQ 26 _C HTSX V Ct e_1 22 79 oB 4S e_ 88 oB 324 _ZV 2_UOW Ct 22 25 75 32 _C6D V e_ 60 oB 6_C 2X Ct 22 322_C 5_I _IZ e_ 87 oB Z QV Ct 87 32 4S 21 90 oT 7_C4 X_ e_ 23 84 ZV Ct e8 56 oB 0_ S_ Ct 60 31 C4 ZV 17 72 oR 7_C6 X_ e_ 21 IZ Ct e7 00 oB R075 DS Ct 88 31 _C X_ 21 63 8_ IZU e_ 33 oB C6 6_IZU WJ Ct e3 69 oB 316_ D2 89 _Z OWJ QV Ct 22 37 oB 310_ C6SD UW 96 Ct e_ 21 34 oB 309_ C4_Z _IZUO JQV e_ 60 oB 311_ C4_Z Ct 15 26 65 JV e_ Ct Ct e9 51 33 oL 312_ C3_Z 19 26 85 oR 482_ C4_Z e_ oB R8 C4 Ct Ct e3 27 31 oK 332_ _C 83 _Z e6 60 Ct e9 68 47 oB 749_ C6 3_Z Ct 14 45 08 oK 329_ C6 TDSF M e_ i7 17 03 a oK 75 3_ C6 S_ _Z Ct Lg 17 d_ oK 747_ C6 SN ZW UO i7 W Lg ke a oB 746_ C6 TS _IZUO JV JQ lin ked_ p oR 32 C6 SF NF_Z W V X. with 31 and 17. Expression profile of C2H2 genes along Scaffold C2H2 Zinc finger. a 173 Hsa 19_c Lgi_ 823 Hsa _PC 173 _PC DH. Obi (blue).6 0. 0. cadherin domains found in Obi and Cte. Lgi (teal). All rights reserved . Amphimedon head-to-tail manner along each scaffold. 2 86_C 6TD _PC gam A8_ oJ054_C 3L_Z NSF Hsa DH_ ma. Cte_199156 13 Hsa_PCDH_beta-8 oRR625_C15 Hsa_PCDH_beta-15 Lgi_161952 Hsa_Ret_a Hsa_PCDH_beta-6 Dme_CDH_96Ca Hsa_PCDH_beta-5 oE098_C4MR_IZV Hsa_PCDH_beta-4 oRR092_C2_IZ IV Hsa_PCDH_beta-3 oRR444_C2Y_IZ Hsa_PCDH_beta-2 opRR671_C3A_IZ Hsa_PCDH_beta-7 oC831_C2X_IZ 10 Hsa_PCDH_beta.4 0. fat.1 DH -2 A8 2 VII phi oRR a_C -CDH ry_ H-6 a_ CD H-4_1 phi 195 de sm og le in -3 _D sc 2a o_ DH 8E5L C10X 3_EA F6 19 gr_1 ng F_LA sc H-9 Am Cte CA ELS EG AAG0 GR 0 3a Am 2_ AA G_7p 80.045 b. panel a. 0. 0. III. IX. VII. i_1 86 Dm iCd 16 Lg 37 4 s_p e_d i_2 cdh ach hr1 8 Lg 36 7823 11_ Lg sou i_2 Xlin i_1 Lg 666 ked Cte 430 s i_1 27 _XP 94 Lg 388 _00 _13 563 Lg i_2 147 3 274 Am 139 8 Lgi _17 768 0 phi Am PC 0. Type I classical cadherins. IV.2 0. calsyntenins. Scaffold 19852 contains 58 C2H2 genes 100 kb that are transcribed in different directions. bb cc 0. tandems Not clustered developmental transcriptomes show high levels of ger.

emphasizing the deep evolutionary and vertebrates (Table 2 and Supplementary Note 10). Other transposons such as Mariner show no such Fisher’s exact test P value . protostomes27. transposons is unclear. Glutamate-gated chloride channels 7 5 8 5 1 6 0 Among the octopus complement of ligand-gated ion channels. We also sons in shaping genome structure (Extended Data Fig. a key struc. Interestingly. present in the model organisms D. found an expanded C2H2 ZNF repertoire in amphioxus (Table 1).30 (Supplementary Note 7. early development and transposon silencing. The octopus genome contains nearly 1. Aca. Extended Data Fig. suggesting distinct roles for different classes of transpo- C2H2 gene expansions to b-satellite repeats in humans24.81 in nervous system development and function. the that vertebrate and fly gene number differences are not necessarily diagnostic of exceptional vertebrate synaptic complexity (Supplemen- tary Note 10.6). 10a). that in the nervous system. The majority of the Transposable element insertions are often associated with genomic transcripts are expressed in embryonic and nervous tissues (Fig. anionic glutamate and acetyl. Glutamate 21 15 47 36 30 15 18 choline receptors). Note 4). between octopuses and decapodiforms in brain organization. All rights reserved . as do some divergent glutamate receptors in other convergent evolution between these clades at the molecular level. 9). We found suggest a common mode of expansion of a highly dynamic transcrip. 3c). mammals and octopus-specific genes. 3b).22. we identified 74 Aplysia-like and 11 verte- As with the protocadherins. We did find variations in the sizes of Voltage-gated sodium channels 3 2 3 2 4 0 13 neurotransmission gene families between human and lophotrochozo. containing novel structures. insertion near genes with less complex patterns of tissue-specific gene Several patterns emerged from this survey.25 Mya (Fig. tary Note 4 and Extended Data Fig. many of which were expressed in tissues have four DLGs. have as many as 60 (Supplementary Note 8. Our analyses found hundreds of coleoid- a key component of the postsynaptic scaffold. Extended Data Fig. This correlation may reflect modulation of gene expression identifying their homologues in octopus and comparing numbers by transposon-derived enhancers or a greater tolerance for transposon across a diverse set of animal genomes (Supplementary Notes 8–10). elegans often Using a relaxed molecular clock. tions around these genes date to the time of tandem C2H2 expansion ing regions of these genes show a significant enrichment in a 70–90 base (Extended Data Fig. 8). Aplysia californica.1 3 10216). bimaculoides C2H2 ZNFs coincides with a in octopus are enriched in neighbouring SINE content. we estimate that the octopus and showed striking departures from those seen in lophotrochozoans squid lineages diverged . For example. melanogaster and C. In contrast. Two pore 12 9 12 14 11 47 15 Non-voltage-gated 27 21 26 26 18 72 39 ligand-gated 5-HT receptors). amid extensive transcription of octopus transposons. choline receptors. we Number of subunits of representative ion channel families in O. we found multiple clusters of C2H2 brate-like candidate chemoreceptors among the octopus GPCR super- ZNF transcription factor genes (Fig. elevated transposon expression in neural in combination. in which they form the second-largest gene family23. LETTER RESEARCH squid protocadherin expansions may reflect the notable differences most of which are tandemly arrayed in clusters (Extended Data Fig. neurotransmission gene family sizes in the octopus were very similar to those seen in other lophotrochozoans (Table 2 and Supplementary Note 10). conserved in other species (Supplementary Note 6 and Extended Data as demonstrated in genetic model systems23. others were absent in mammals and Cys-loop receptors present in invertebrates (for example. rearrangements29 and we found that the transposon-rich octopus gen- This pattern of expression is consistent with roles for C2H2 ZNFs ome displays substantial loss of ancestral bilaterian linkages that are in cell fate determination. In addition. containing genes (Table 1). which These subunits lack several residues identified as necessary for the have been most clearly demonstrated for the vertical lobe. 8 and Supplementary iated with axon guidance (Table 1) and neurotransmission (Table 2). The flank. We found.4).330 genes (Extended Data Fig. identified a set of atypical nicotinic acetylcholine receptor-like genes. the inde. 3a and Supplementary Note 8. we found three DLGs in both octopus and limpet. tion in octopus. which (along with other observations) led to sugges. 9d).800 multi-exonic C2H2. pointing to a crucial period of genome evolu- pair (bp) tandem repeat (31% for C2H2 genes versus 4% for all genes. including the chromatophore-laden skin. Dendrogram above species names shows their evolutionary relationships. GABA 6 5 4 9 3 7 19 mals. but no evidence for sys. Transposable element activity has been implicated in the modifica- showing a similar enrichment in satellite-like repeats. 1 3 AU G U S T 2 0 1 5 | VO L 5 2 4 | N AT U R E | 2 2 3 G2015 Macmillan Publishers Limited. The tissues has been suggested to serve an important function in learning octopus C2H2 ZNFs typically contain 10–20 C2H2 domains but some and memory in mammals and flies28.1 and D. 6).270 Mya. Voltage-gated 30 23 29 20 10 51 40 tematic expansion of these gene families in vertebrates relative to Calcium-activated. 7). The high level of expression of these divergent sub- pendent expansions and nervous system enrichment of protocadherins units within the suckers raises the interesting possibility that they act as in coleoid cephalopods and vertebrates offers a striking example of sensory receptors. These parallels tion of gene regulation across several eukaryotic lineages29. Taken together.4). The complement of neurotransmission genes Nicotinic acetylcholine 53 16 52 77 10 88 16 in octopus may be broadly typical for a lophotrochozoan. which parallels the linkage of enrichment. genes that are linked in other bilaterians but not The expansion of the O. we surveyed genes assoc. small/large conductance 12 8 9 6 3 6 8 octopus or other lophotrochozoans. Although some gene families were Inward rectifying 3 4 5 6 4 3 16 larger in mammals or absent in lophotrochozoans (for example. Finally. history of coleoid cephalopods8. the subesophageal brain. Table 2 | Ion channel subunits Overall.49 in the optic lobe to 0. tions that vertebrates possess uniquely complex synaptic machinery25. but our Inhibitory acetylcholine 3 2 5 2 0 4 0 5-HT3 0 0 0 0 0 1 5 findings suggest it is also not obviously smaller than is found in mam. family of . Fig.2). so it is unlikely that they function as acetyl- ture in cephalopod learning and memory circuits2. suggesting Supplementary Note 11). these novel genes. 9c). tissue-specific is positively correlated with the transposon load around To investigate further the evolution of gene families implicated in that gene (r2 values ranging from 0. 10 and However. Transient receptor potential channels 36 45 40 43 13 23 29 K1 channels ans (Table 2 and Supplementary Note 10). bimaculoides and across examined taxa. The gene complements regulation. result in highly specific nucleic acid binding. the suckers and the nervous system (Extended Data Fig. SINE inser- burst of transposable element activity at . the degree to which a gene’s expression is tion factor family implicated in lineage-specific innovations. Although the role of active C2H2 ZNF transcription factors contain multiple C2H2 domains that. more than the 200–400 C2H2 ZNFs found that a class of octopus-specific short interspersed nuclear element in other lophotrochozoans and the 500–700 found in eutherian sequences (SINEs) is highly expressed in neural tissues (Supplemen- mammals. melanogaster encodes one member of the discs large (DLG) family. except for a few strikingly expanded gene Obi Aca Lgi Cte Dme Cel Hsa families such as the sialic acid vesicular transporters (sialins) Voltage-gated calcium channels 8 8 6 10 9 10 10 (Supplementary Note 10. binding of acetylcholine26.

S. OIST/Berkeley group: O. Evol. the effect of data partitioning on resolving phylogenies in a Bayesian infers three paleopolyploidies in the mollusca. Gene 509. L.. The evolution of flexible behavioral repertoires in (IOS-1354898) and NIH (R03 HD064887) to C.M. Drummond. Begovic for genomic DNA preparation. Nithianantharajah. R. Liu. Mota. N. Genome Res. Cellular scaling rules for rodent brains. A. 4. and the University of Chicago Functional Genomics Facility. supported by NIH S10 instrumentation grants S10RR029668 and and the origins of vertebrate development. 432–439 (2012).W.. unless indicated otherwise in the credit line. org/licenses/by-nc-sa/3. D.R. & Myers. T. dissection. accepted 16 June 2015. Author Contributions The Chicago and the OIST/Berkeley groups initiated their 13. Genome structure analysis of molluscs revealed whole genome Supplementary Information is available in the online version of the paper. J. 16–24 (2013). Bioessays 33.R. 354–366 (2004). USA 99. E. Graveley. Technology Graduate University (S. neural excitability.W. F. Sci. 3. are available in the online version of the paper.. 6. A browser of this 15. Deep vertebrate roots for that contributed to the evolution of cephalopod neural complexity and mammalian zinc finger transcription factor subfamilies. T.R. J. Crystal structure of an ACh-binding protein reveals the ligand- Received 26 December 2014.B. V.. Hallinan.S.. 74. Natl Acad. Brown and J. genome duplication in the yeast Saccharomyces cerevisiae. RESEARCH LETTER expansion of C2H2 ZNFs..Y. Mol. Coates Genomics Sequencing Laboratory at UC 11. protocadherins. 304–307 (2004). Caruso. J. In the subsequent collaboration. 7–15 (2012).idyll. M. and NonCommercial-ShareAlike 3. Correspondence and requests for materials should be addressed (2002). This work used the Vincent J. M. J.. R. J. Brandenburger. F. Noonan. Neurosci. Synaptic scaffold evolution generated components of to these sections appear only in the online paper. (dsrokhsar@gmail. Duboule. Development 140. S. Gui for bioinformatic assistance. 21.Y.0 2 2 4 | N AT U R E | VO L 5 2 4 | 1 3 AU G U S T 2 0 1 5 G2015 Macmillan Publishers Limited.metazome. Genome Biol. Sci. visit http://creativecommons. Simakov. and extensive 21. 791–808 (1998). Grimwood. H.-G. e1001064 1. morphological innovations. Nature 453. et al. H. J. 1978). HOX genes in the sepiolid squid Euprymna scolopes: implications no competing financial interests. 12138–12143 (2006). J. 2088–2093 version of the paper. All rights reserved . S. references unique 25. 23. Genome Res. Nature Neurosci. Creative Commons licence. 14. vertebrate cognitive complexity. D. third party material in this article are included in the article’s Creative Commons licence. Kro¨ger. B. Mobile DNA elements in the generation of 2. Proof and evolutionary analysis of ancient transcriptome and genome projects independently. Callaerts. Hanlon. T. Science 304. H. Eichler.. phylogeny of coleoid cephalopods (Mollusca: Cephalopoda) using a multigene 5. duplication and lineage specific repeat variation. et al. B. W. The Ashbya gossypii genome as a tool for mapping the ancient Saccharomyces cerevisiae genome. J. Jackson.S. Neuron 74. Loligo pealeii (Squid) Data Dump (http:// to reproduce the material. C. & Sanes. if the material is not included under the 3297–3302 (2013). Cephalopod origin and evolution. 37. Herculano-Houzel. et al. J. T. Octopus: Physiology and Behaviour of an Advanced Invertebrate on eukaryotic genomes: from genome size increase to genetic adaptation to (Chapman and Hall.org/blog/2014-loligo-transcriptome-data. and C. & Maniatis. M. H. J. & Messenger. (cragsdale@uchicago.R. 497–506 (2014). P. S. 510–525 (2014). & Gage. Putnam. Orenstein. & Basil. Nature Rev. and Source Data. 16. Che´nais. 19.B. Brain Behav. Ancient protostome origin of chemosensory ionotropic glutamate receptors and the evolution of insect taste and olfaction. B. & Lent.net/). Clustered protocadherins. M. Brown. 6. Grasso.. 1971).. 2549–2560 genome assembly is available at (http://octopus. This work is licensed under a Creative Commons Attribution- 18. 1064–1071 (2008). Hiard. for genomic DNA isolation. J. 231–245 (2009). PLoS Genet. Molecular approach to the taxonomy of Nautilus. P. 125–133 S10RR027303. (2010). P. 26. to Z. Young. Biol. & Sidow.. This work was 9. M.W. Nature 411. Readers are welcome to comment on the online for the evolution of complex body plans. Development 134.. et al. Gene duplications Berkeley. The authors declare 16. & Stubbs. Marchetto.B. Glotzer and H.. E. Holland. 343–353 (2010).R. Birren. Caldwell for providing the O. J. Alternative sites of synaptic plasticity in two homologous ‘‘fan-out changes in an otherwise ‘typical’ lophotrochozoan gene complement fan-in’’ learning and memory networks. 133–138 (2004).S. et al. Cell 143.W. T.A.. Biol. 1773–1782 (2011).nature. W. A. Z. J. Vinther. E. permissions information is available at www.. Press. V. C. 6. R.) and by funding from the NSF 10. C. Y. 63–71 (2011). A-to-I RNA editing: effects on proteins key to RNA-seq data available before publication. and E. in the SRA as BioProjects PRJNA270931 and PRJNA285380.R. M. Lu. B.R. 28. 8. Huffard and R. Wells. Ozouf-Costaz. Evol. (2004). binding domain of nicotinic receptors. 14. et al. The impact of transposable elements 3. Evol. Proc. & Rosenthal. Ha. H. Bonnaud. by NIH grant UL1 TR000430. The amphioxus genome and the evolution of the chordate Author Information Genome and transcriptome sequence reads have been deposited karyotype. Nature 428. Complex b-satellite repeat structures and the expansion of the Online Content Methods. N. et al.com). Sun. C. 617–624 both groups worked closely on every aspect of the project. L. K. The images or other neural circuit assembly. S. Kellis. 1996). Shomrat. R. 15.W.). 27. Rosenthal for making Doryteuthis 7.P. T.. supported (1994). J. stressful environments. 29. Strugnell. S. (2011). Genome Biol. Development (Suppl. and from the NSF (DGE-0903637) cephalopod molluscs. Phylogenet. Brejc. J. to C. 526–531 (2013). Chang. Press. The Anatomy of the Nervous System of Octopus vulgaris (Clarendon diversity and complexity in the brain. N. supported by the Molecular Genetics Unit of the Okinawa Institute of Science and Proc. Garcia-Fernandez. Shigeno for help with tissue 8. Erwin. W.S.. Insights into bilaterian evolution from three spiralian genomes.E. D. Chicago group: C. J. & Lander. 1150–1163 framework. M. J. 269–276 (2001). Cephalopod Behaviour (Cambridge Univ. F. N. & Boucher-Rodoni.com/reprints. & Seeburg. J. Dietrich. 426–441 (2005). J. The rise and fall of Hox gene clusters. To view a copy of this licence. and D. R. Reprints and (2007). W. Natl Acad. Z.. Evol. S. P. et al.. R... & Fuchs. A. C. Rosenthal. Schmutz. et al. M. ivory. & Cooper. & Lindberg. Nature 493.. A molecular and karyological 30. R. along with any additional Extended Data display items zinc finger gene cluster in 19p12. L. A. Gene conversion and the evolution of protocadherin gene cluster diversity. A. A. J. A. & Casse. resulting in 22. Dickson. 12. Curr. M. 17. Norman. and cis-regulatory elements in the octopus genome. J. Gene 483. J. E. Zipursky. C.. Comparative analysis of chromosome counts approach..html) (2014). and D. 327. 24.0 Unported licence.. Croset.edu) or D. Yoshida. genome rearrangements. Chen. O. L. B. users will need to obtain permission from the licence holder 20. Chemoaffinity revisited: Dscams. Acknowledgements We thank C. A. bimaculoides specimen used 602–613 (2011). USA 103. X. B. et al. transposable element activity yield a new landscape for both trans. Williams.

66–2.2 further analysis.net/). A range of 76–90% of reads from the had as close to full-length sequence as possible. Following assembly. Genes known in vertebrates to have many isoforms. castaneum.55 Mya). floridae. D. redundant coverage in libraries spanning a range of pairs from . taken as those having SNPs with A-to-G substitutions in the predicted herins appear to have expanded . Crassostrea.6a. rerio. their assignment to either conserved or lost bilater- transposable element is a previously identified octopus-specific SINE38 that ian synteny in octopus was done using the microsynteny calculation described accounts for 4% of the assembled genome. Genome and transcriptome sequence reads are deposited in the SRA the octopus genome and transcriptome assemblies using BLASTP and TBLASTN as BioProjects PRJNA270931 and PRJNA285380. blies by artificially fragmenting genomes to contain on average 5 genes per scaffold De novo transcriptome assembly. Twelve transcriptomes were sequenced from RNA assignments we limited our analyses to 4. North Carolina). A.1. Multiple gene models that contains 99 genes and half of all predicted genes are on scaffolds with 8 or more matched the same transcript were combined. S.558 annotated transposon loci using BEDTools44. we simulated shorter assem- (insert size 300 bp) were generated on an Illumina HiSeq2000 sequencing machine. Annotation statistics are provided in 5.34%) Extended Data Fig. gigantea.1-1 in discovery mode between squid and octopus was estimated using r8s40 by fixing cephalopod diver. vectensis.0.033 gene families shared among human. For RepeatMasker37. and TBLASTN searches against the whole NR database53 and a custom database Quantifying gene expression.685. Drosophila and developmental stage 15 (St15)33. We used Nmax 10 (maximum of 10 intervening genes) 0. C. subesophageal brain (Sub). suckers. Microsynteny was computed based on metazoan node gene families We observed nucleotide-level heterozygosity within the sequenced genome to be (Supplementary Note 7). diction methods (Supplementary Note 4). queenslandica. To simplify gene family Transcriptome sequencing. or Jalview50. pus and other bilaterians were aligned with either MUSCLE48. Octopus sequences with a match of 1 3 1025 or better to a function. supraesophageal brain Nematostella. based on an alignment of gence from bivalves and gastropods to 540 Mya8. and gene families that appear to be UNIREF90 hits. the limpet Lottia and amphioxus assembled RNA-seq output to the genome to evaluate the completeness of the retain more than twice as many such linkages (96 and 140.11 (ref.2.2 based on the protein-directed CDS alignments between the species (Supple. Protein-coding genes were annotated by genes for retained synteny and 1. were manually curated and analysed. gigantea. Our estimate of 270 Mya for the 350 bp and 500 bp genomic fragment libraries using BWA-MEM version the squid–octopus divergence corresponds to mean neutral substitution rate of dS 0. Genomic DNA from a single male Octopus tome databases for L. Synteny. showed expression in at least one of the tissues. In contrast. also show alternative splicing in octopus but RNA editing. which may reflect a small effective population size relative to broadcast. Of these. sequences. These data were assembled with meraculous32 achieving a contig with BLAST47 and Pfam45 analysis. 17 (Supplementary Note 6).1 corresponds to 135 million years. Candidate edited genes were .107. Scans for intra-genomic synteny. TBLASTX to search for specific gene families in deposited genome and transcrip- Genome sequencing and assembly. with SAMtools43. and regulatory genes. octopuses diverged from squid. and axial nerve cord (ANC) (Supplementary of one species present in both ingroups.08%.1. for Mariner. including developmental were characterized by their BLAST alignments. and the masking was done with stream regions of genes were surveyed for transposable element (TE) content. We searched Data access. The read counts in each tissue were produced with BEDTools using the longest match to query NR.265 loci Supplementary Table 4. were also performed using Pfam45 and PANTHER46. we removed SNPs predicted in both the transcriptome . amphioxus. neural-related genes. SNPs and indels Calibration of sequence divergence with respect to time. We required ancestral bilaterian syntenic blocks to have a minimum (Supra). respectively.1. For homology prediction we used pre. B. C. Genes identified in the octopus genome were N50-length of 5. mouse and D. The O. Alternative splice iso. were conducted in R. clustered octopus protocadherins are Cephalopod-specific genes. difference remains significant after accounting for genes missed through orthology only transcripts with ORFs predicted by TransDecoder were mapped onto the assignment as well as simulations of shorter scaffold sizes (Supplementary Note 6. The identified sequences from octo- genes (Supplementary Note 1). or . which Gene complement. for SINEs. we found that octopus conserves only 34 out of lated using TransDecoder in the Trinity package.193 genes in lost synteny for all TE classes. 440 combining transcriptome evidence with homology-based and de novo gene pre. Transposable elements were identified Transposable elements and synteny dynamics. making it several amino acid evolution. Annotation of transposable elements. and A. kowalevskii. PFAM-A/B.7 gigabases (Gb) based on fluorescence (2.68 Gb) and were constructed with FastTree51 using the Jones–Taylor–Thornton model of k-mer (2. Gene families of particular interest. and visualized with FigTree v1. peptide-coding regions were trans. genome with BLASTN. respectively. N. and at a more limited level. To ensure that we assembly with TopHat 2.86 Gb) measurements (Supplementary Notes 1 and 2).135 Mya based on mean pairwise dS . other animals. viscera. skin. bimaculoides31 was isolated and sequenced using Illumina technology to 60-fold D. as outlined in Supplementary Note 4. The 5 kb upstream and down- with RepeatScout and RepeatModeler36. pipeline described in ref. musculus and H.2. 9b).496. M. sapiens. to non-cephalopod animals at an e-value cutoff of 1 3 1023 were considered for Heat maps showing expression patterns were generated in R using the heatmap. Using BEDTools44. RNA-seq reads were mapped to the genome with TopHat52. SNPs were binned according to the type of manuscript we convert from sequence divergence to time by assuming that dS nucleotide change and the direction of transcription.7. Cephalopod novelties were obtained by BLASTP much more similar in sequence (mean pairwise dS . posterior salivary gland (PSG). and 116 and 290. Assembly statistics are summarized in is substantially rearranged. Lottia. The number of genes for each category and TE class were as follows: 484 Annotation of protein-coding genes. optic lobe (OL). 6. We compared the de novo 198 ancestral bilaterian microsyntenic blocks. A. after mRNA transcripts. elegans. Adapters and low-quality reads were removed (Supplementary Note 6). To Note 2). were performed as described in Supplementary Note 6. The divergence were predicted using GATK HaplotypeCaller version 3. This genome assembly.259 transcripts with ORFs (2. bimaculoides haploid genome size was MacVector 12.3. isolated from ova. melanogaster. Gene sequences with transcriptome support but without a match ized by the total transcriptome size of each tissue and by the length of the gene.2) and a dS estimation using the yn00 program41. testes. Mapped reads were sorted and indexed from octopus genomic sequence with our de novo assembled transcriptomes. The longest scaffold confirmed and extended using the transcriptomes. californica. with a minimum Phred scaled probability score of 30. Helobdella. teleta. unlinked octopus protocad.4 kb and a scaffold N50-length of 470 kb. Throughout the were outside of predicted genes. Octopus genes with ten or more alternative splice forms are SAMtools43 was used to identify SNPs between the genomic and the RNA provided in Supplementary Table 4. Only 1. such as ankyrin. retina. To identify polymorphic positions in the genome. C. and the genome and discarded SNPs that had a Phred score below 40 or mentary Fig. The cephalopod-specific gene families are listed in the Source G2015 Macmillan Publishers Limited. LETTER RESEARCH METHODS expanded in O. californica) along with selected other metazoans. To minimize the number of spuriously assembled transcripts.1. gigas. Capitella. We used BLASTP and bly is available at (http://octopus. HMM. 2. times larger than other sequenced molluscan and lophotrochozoan genomes17.130 out of 48.1. 42). we extended proteins predicted different samples mapped to the genome. and doubly conserved did not have a match in the genome with a minimum identity of 95%. respectively). The counts were normal. The most abundant genes with non-zero TE load.2. melanogaster. Wilcoxon U-tests for the difference of TE load in linked versus non-linked genes dicted peptide sets of three previously sequenced molluscs (L. intestinalis. RNA was isolated using TRIzol (Invitrogen) and 100-bp paired-end reads examine the effect of fragmented genome assemblies. bimaculoides. A browser of this genome assem. and 1.350 bp to 10 C.4. For example. All rights reserved . C. and Nmin 3 (minimum of three genes in a syntenic block) according to the spawning marine invertebrates. T. transcriptome and EST sequences from multicov program44 using the gene model coordinates. before assembling transcriptomes using the Trinity de novo assembly package In comparison with other bilaterian genomes. The genome assembly and with annotated sequences from human. above. sequence from another cephalopod were used to construct gene families.1). CLUSTALO49. we find that the octopus genome (version r2013-02-25 (refs 34.1. or in one ingroup and one outgroup. Transcriptome reads were mapped to the genome of several mollusc transcriptomes (Supplementary Note 11. Phylogenetic trees estimated to be .6 (MacVector. In looking at microsyntenic linkages of genes with a Supplementary Table 2.0. To assess transposon activity we assigned transcriptome reads aligned to forms were identified with PASA39. synteny with Lottia.2. TRAK1 and LRCH1. 35)). gigas.metazome. Genome size and heterozygosity. Octopus. maximum of 10 intervening genes. Candidate genes were verified kilobases (kb). Bulk analyses annotation are linked to the same BioProject ID.

(2003).1 3 1023). H. 10617–10626 (2005). 56.0. RNA 15. 54. & Barton. 41. 32–43 (2000). Institution Libraries.. Sanderson. Haas. Fast and accurate short read alignment with Burrows–Wheeler (2009). & Thomas. Martin. 2000). J. et al. P. 58. C. e23501 (2011). et al. Cutting edge: IL-17D. K. D.. R. & Nielsen. 53. Bioinformatics 26. Immunol. Ohshima. C.. expressed in the chemosensory organs of the mollusc Aplysia. Robertson. G2015 Macmillan Publishers Limited. 3389–3402 (1997). Boletzky. S. Price. Characterization of three different tRNA-derived Nucleic Acids Res. H. RESEARCH LETTER Data file for Extended Data Fig. MRBAYES: Bayesian inference of phylogenetic divergence times in the absence of a molecular clock. et al. 1208–1218 (2009). M. P. J. RepeatModeler Open-1. Jalview 34. 29. M. from any other animals (. R. & Green. A. Generality of the tRNA origin of short interspersed curated non-redundant sequence database of genomes. 562–578 (2012). E. Identification of molluscan nicotinic acetylcholine receptor 44. D. K. Biol. 7. 5654–5666 (2003). 42. genomic features.. I. & Arkin. F. Smit. PLoS ONE 5. 539 (2011). J. 33. 169. 41. Candidate chemoreceptor subfamilies differentially RNA-seq. Nature Protocols 7. RepeatMasker Open-3. 1189–1191 (2009). De novo transcript sequence reconstruction from RNA-seq using 51. et al. 25. J. H. F. 40. without a reference genome. transform. et al. Sievers. Naef.. Waterhouse. D. et al. Grabherr. Bulletin of the Bingham Oceanographic Collection 12.. et al. B. F. Pachter. 48. 1–66 database search programs. retroposons in the octopus. van Nierop. & Hall. D377–D386 (2013). C. Cephalopoda. Edgar. J. B. & McConnaughey. A. J. v. Nucleic Acids Res. likelihood trees for large alignments. BMC Biol. A. M. 1494–1512 (2013). All rights reserved . (2008–2010). Mol. J. Mol. R. Meraculous: de novo genome assembly with short paired-end and space complexity. 42.. D. MUSCLE: a multiple sequence alignment method with reduced time 32. 243. H. 31. et al. Evol. Haas. scalable generation of high-quality protein multiple sequence 33. Biol. & Roper. Gapped BLAST and PSI-BLAST: a new generation of protein in sibling species.. TopHat: discovering splice junctions with 57.and anion-selective nAChRs. 31. et al. Chapman. Embryology (Smithsonian alignments using Clustal Omega. FastTree 2–approximately maximum- the Trinity platform for reference generation and analysis. Mi. Neurosci. J. 37. & Ronquist. 25–37 (1994). A. P. G. R. A. Differential gene and transcript expression analysis of RNA-seq 36. Yang. G. R. P. Full-length transcriptome assembly from RNA-seq data Version 2–a multiple sequence alignment editor and analysis workbench. repetitive elements (SINEs). F. Pickford. Finn. (1996–2010). BEDTools: a flexible suite of utilities for comparing (nAChR) subunits involved in formation of cation. Mol. cephalopods. 52. P. 25. Pruitt. L. Broxmeyer. 1105–1111 (2009). & Maglott. O’Connell. E. including nautiloid and decapodiform 46. J. Nature Biotechnol. Quinlan. A. 28 43. Cummins. R.0. P. Clamp. Bioinformatics 25. R. et al. N. experiments with TopHat and Cufflinks. Bioinformatics 25.. & Hubley. Li. J. 841–842 (2010). R. Octopus-specific novelties were defined as 45. 17. M. 113 (2004). Procter. M. N. 10. S. Estimating synonymous and nonsynonymous substitution a novel member of the IL-17 family. A. S. M. Tatusova. 49. Syst. r8s: inferring absolute rates of molecular evolution and 55. B. 642–646 (2002). The Octopus bimaculatus problem: a study 47. M. Biol. PANTHER in 2013: modeling the evolution of gene function. A. F. Trapnell. D501–D504 (2005). Smit. reads. 7. S. J. T. Nucleic Acids Res. Palavicini. Improving the Arabidopsis genome annotation using maximal binding domain confers high activity to a squid RNA editing enzyme. & Rosenthal. sequences with transcriptome support but without any matches to sequences D222–D230 (2014). An extra double-stranded RNA 39. Nucleic Acids Res. Pfam: the protein families database. T. M. (1949). & Durbin. B. Bioinformatics 25.. A. and other gene attributes. & Hromas. 644–652 (2011). R. Bioinformatics 17. Huelsenbeck. PLoS ONE 6. e9490 (2010). E. G. Altschul. J. P. in the context of phylogenetic trees. Nature Protocols 8. Nucleic Acids Res. Trapnell. hemopoiesis. Muruganujan. 301–302 trees.. & Salzberg. stimulates cytokine production and inhibits rates under realistic evolutionary models. Starnes. BMC Bioinformatics 5. 754–755 (2001). L. NCBI Reference Sequence (RefSeq): a 38.. Dehal. C. J. Hubley. 1754–1760 (2009). 50. M. J. transcripts and proteins. & Okada. Bioinformatics 19. D. Z. S. 35. Fast. transcript alignment assemblies.

and Obi with Shimodaira–Hasegawa-like support by the type of change (see key) in the direction of transcription. ADAR2 and ADAR-like changes are the most prevalent. All rights reserved .97 Ocbimv22009676m_ADAR2 1 1 Dop_ADAR_B Dop_ADAR_A c d 1500 Number of DNA-RNA differences ADAR1 varia A-C A-G 1000 A-T ADAR2 C-A C-G C-T ADAR-like G-A /ADAD G-C Viscera Suckers OL Skin St15 Sub PSG Ova ANC Testes Supra Retina 500 G-T T-A T-C T-G 3 2 1 0 1 2 3 0 Row Z-score Ova Testes Viscera PSG Suckers Skin St15 Retina OL Supra Sub ANC DNA RNAdiff Extended Data Figure 1 | RNA editing in octopus.85 Dme_ADAR ADAR-like/ADAD Cin_ADAR2 1 Cte_176450 Hsa_ADARB1 Adenosine 1 dsRBD dsRBD 0. Expression profiles of the three ADAR genes found in 12 uncharacterized polymorphisms. DNA–RNA differences in O. Approximate bimaculoides show prominent A-to-G changes.18 Mmu_ADARB2 Lgi_128560 0. ADAR1 also has a z-alpha of changes were also detected at lower levels. paralleling the expression of octopus ADARs in c. bimaculoides tissues by RNA-seq profiling. d. ADAR-like/ADAD.16 0. Adenosine Ocbimv22010033m_ADAR-like dsRBD alpha deaminase 0. bimaculoides ADAR1. and ADAT (tRNA-specific the genome and 12 O. Other types as well as an adenosine deaminase domain. possibly resulting from domain. G2015 Macmillan Publishers Limited. Histogram illustrates the maximum likelihood tree of adenosine deaminases acting on RNA (ADARs) in number of DNA–RNA differences detected between coding sequences in bilaterians.59 Cin_ADAR Adenosine Hsa_ADAR dsRBD dsRBD 0. Lgi. opalescens (Dop54).33 Cin_ADAT 0. particularly in neural tissues and during proteins contain one or two double-stranded RNA binding domains (dsRBD) development. O. Cin.91 Cte_228448 ADAR2 Lgi_166687 0.88 Lgi_139693 0.64 Cte_224434 ADAR1 Cte_220233 0. b. ADAR1. O. bimaculoides transcriptomes after filtering out adenosine deaminase) were identified in Hsa. Mmu.92 Z.85 Mmu_ADAR Cte_183692 ADAR1 0. LETTER RESEARCH a Hsa_ADAT1 b 1 Mmu_ADAT1 1 Ocbimv22027735m_ADAT ADAT 0. Dme.9 Lgi_231337 0.58 Mmu_ADARB1 deaminase 0. c. A-to-G indicated at the nodes.98 Mmu_ADAD1 1 Hsa_ADAD2 Mmu_ADAD2 Cte_171815 0. polymorphisms identified in genomic sequencing. ADAR2.88 1 Hsa_ADAD1 ADAR-like 0.98 1 Hsa_ADARB2 ADAR2 0. Differences were binned D. a.57 0.16 1 1 Lgi_133731 Ocbimv22018643m_ADAR 0.14 1 deaminase 0. Cte.

G2015 Macmillan Publishers Limited. bimaculoides many of the scaffolds clusters of H. sapiens and the single B. teleta are found positioned to illustrate orthology. zen and zen2 (the latter shown. floridae cluster are depicted. gigantea has a single cluster with the full known O. melanogaster 392 kb // 320 kb Lab Pb Zen Zen Dfd Scr Ftz Antp Zen Ubx Abd-A Abd-B C. bimaculoides genes approximate their locations on the D. RESEARCH LETTER H. The are several hundred kb long. At the top. D. which is also highlighted by colour. the four compact Hox lophotrochozoan gene complement. gigantea 471 kb Lab Pb Hox3 Dfd Scr Lox5 Antp Lox4 Lox2 Post2 Post1 O. floridae 448 kb Hox1 Hox2 Hox3 Hox4 Hox5 Hox6 Hox7 Hox8 Hox9 Hox10 Hox11 Hox12 Hox13 Hox14 Hox15 D. sapiens 107 kb 199 kb 117 kb 98 kb B. bimaculoides and selected bilaterians. Dashed lines indicate that the scaffold continues beyond what is homeotic function. All rights reserved . Hox genes in C. bicoid. bimaculoides 421 kb Lab 474 kb Scr 751 kb Lox5 53 kb Antp 137 kb Lox4 437 kb Lox2 231 kb Post2 187 kb Post1 Extended Data Figure 2 | Local arrangement of Hox gene complement in on three scaffolds17. L. Genes are three are represented as overlapping boxes). melanogaster locus that are homologues of Hox genes but have lost their scaffolds. such as fushi tarazu (ftz). teleta 243 kb // 22 kb Lab Pb Hox3 Dfd Scr Lox5 Antp Lox4 // Lox2 Post2 Post1 L. and no two Hox genes are on the same scaffold. We included genes in The positions of O. melanogaster Hox complex is split into two clusters. In O. Scaffold length is depicted to scale with size noted on the left.

castaneum. Principal component analysis of gene family For methods. Adi. Gallus gallus. MrBayes55 tree (constrained counts. Xtr. Ava. Tca. evolution in metazoans. O. see Supplementary Note 7. Tribolium changes per site. Isc. For methods. Acropora digitifera. b–d. All rights reserved . Dpu. S. introns (c). Xenopus tropicalis. scale bar represents estimated cnidarians in orange.3. Deuterostomes are indicated in topology) on binary characters of presence or absence of Pfam domain blue. G2015 Macmillan Publishers Limited. lophotrochozoans in green. a. and sponges and architectures (b). Hma. see Supplementary Note 7. Hydra magnipapillata. LETTER RESEARCH Extended Data Figure 3 | Gene complement and gene architecture Spu. ecdysozoans in red. purpuratus. Daphnia pulex. or indels (d). Gga. bimaculoides highlighted in green. Ixodes scapularis. Adineta vaga.4.

oE093_C6TDS C3_IZ oE097_oT898_ Hsa_PCDH_ggamma-C4_1 opE100_C3D _IZUOJV Hsa_PCDH_betC5_1 amma. oE093_C6TDS C3_IZ Hsa_PCDH_ gamma-C3_1 opC829_C3D R_UOWV Hsa_PCDH_ggamma-C4_1 opE100_C3D _IZUOJV Hsa_PCDH__gamma-B2_1 opC825_C R_ZUOWV Hsa_PCDH_ gamma-C3_1 opC829_C R_UOWV Hsa_PCDH _gamma-B 1 1_1 opRR578_ 3DR_UOW Hsa_PCDH__gamma-B2_11 opC825_C 3DR_ZUOW V V Hsa_PCDHH_gamma -B7_ opC828_C C3DR_UOQ opRR578_ 3DR_UOW -B6_1 oE096_C6 3DRS_UO V Hsa_PCDHH_gamma-B1_ 1 V Hsa_PCD H_gamma 5_1 Hsa_PCD H_gamma -B7_ opC828_C C3DR_UOQ oRR576_ A_ZUOWV WV a-B6_1 oE096_C 3DRS_U V Hsa_PCDDH_gamma-BB4_1 oT096_ C2_Z Hsa_PCD H_gamm B5_1 ma. scaffold tend to cluster together on the tree. C3_Z ma-A12 1 C3S oRR 026 Hsa_PC DH_gam a-A10_ oG8 10-813 X_UOW ma- Hsa_PCDH_gam a-A12_1 oG808_ _C3A_Z mm 9_1 oT1 Hsa_PC DH_ga mma-A 8_2 86_ _C6 JQ Hsa_PC DH_ga mm 10_1 oG8 C3S 10-813 X_UOW oJ054_ C3L_Z TDNSF_ mma-A 9_1 Hsa_PC DH_ga mma-A _1 oG812 C6ATN ZUO Hsa_PC DH_ga mma-A 8_2 oT1 86_C3L_C6TDN JQ _PC -A8 WJQV Hsa DH_ga ma _1 oJ0 _C F_Z Hsa_PC DH_ga mma-A _1 oJ054_ _Z SF_ _PC H_gam ma-A5 52_ 6AT UO oG812 C6ATN ZUO Hsa CD gam -A2 _1 oG809 C6TD NSF_Z WJQV Hsa _PC DH_ga ma -A8 _1 _C F_Z WJQV Hsa_P CDH_ gamma -A1_1 _C6T NF_ZU UOWJ oJ0 oJ0 _PC H_gam ma-A5 52_ 6AT UO oG8153_C3_ DNF_ OWJQ QV Hsa CD _1 oG809 C6TD NSF_Z WJQV Hsa_P CDH_ amma -A7_1 gam -A2 _g _1 opRR 5_C6 Z ZUOW V Hsa_P CDH_ amma -A1_1 oJ0 _C6T NF_ZU UOWJ Hsa_PPCDH ma _g oG8153_C3_ DNF_ OWJQ QV _gam ma-A64_1 oB33 544_ TDNF JQV Hsa_PPCDH ma _1 Hs a_ DH _gam ma-A 23 _gam ma-A76_1 opRR 5_C6 Z ZUOW V PC oJ05 3_C6 C5TD _ZUO Hsa_ PCDH Hsa_ PCDH _gam 22 57 17 oF97 1_C3 TNF_ NFX_ WJ _gam ma-A 4_1 oB33 544_ TDNF JQV e_ V Hsa_ PCDH _gam -A _ZUO Hsa_ PCDH Ct 22 24 59 oB 3_C3 DA_Z IZUOWUOWJ a_ DH ma 57 23 oJ05 3_C6 C5T TD WJ Hsa_ Ct e_ 18 22 26 oB 323_C4 HTSX JQV QV Hs PC _game_ 22 24 17 oF97 1_C3 TNF_ NFX_ o V Ct e_ 22 79 88 oB 324_ S_ 2_UO Hsa_ PCDH Ct 22 59 oB 3_C3 DA_Z IZUOWUOWJ Ct e_ 22 25 75 oB 326_ C6D2 ZV WV Hsa_ Ct e_ 18 22 26 oB 323_C4 HTSX JQV QV 60 Ct e_ 22 87 87 oB 322_ C5_IZ X_IZ Ct e_ 22 79 88 oB 324_ S_ 2_UO ZV Ct e_ 21 23 90 oT 327_ C4SX QV Ct e_ 22 25 75 oB 326_ C6D2 WV Ct e_ e8 56 oB 840_ C4S_ _ZV Ct e_ 22 60 87 oB 322_ C5_I X_IZ Ct 17 60 2 17 0 oR 317_ C4X_ ZV Ct e_ 21 87 90 oT 327_ C4SX Z QV e_ Ct C te 72 80 3 oB R075 C6DS IZ Ct e_ e8 23 56 oB 840_ C4S_ _ZV 18 36 oB 318_ _C X_ Ct 17 60 2 oR 317_ C4X_ ZV _2 9 oB 316_ C6 6_IZ IZUW e_ 17 0 oB R075 C6DS IZ C te C te 3328 96 7 oB 310_ C6 D2_Z UOW JQ Ct C te 72 80 3 oB 318_ _C 18 36 X_ _2 19 63 4 oB 309_ C4_SD_I UW J V _2 9 oB 316_ C6 6_IZ IZUW C te _2 56 03 5 oB 31 C Z ZUO JQV C te C te 3328 96 7 oB 310_ C6 D2_Z UOW JQ C te _1 92 66 3 oL 31 1_C 4_Z JV _2 19 63 4 oB 309_ C4_SD_I UW J V C te C te 95 13 5 oR 48 2_C 3_Z C te _2 56 03 5 oB 311_ C Z ZUO JQV _1 32 6873 1 oB R882_C 4_Z C te _1 92 66 3 oL 312_ C 4_Z JV C te C te 62 86 0 oK 33 3_ 4_ C te C te 95 13 5 oR 48 3_ 7 oB 74 2_C C3_ Z C oB R882_C 4_Z Z C tete 96 54 8 oK 32 9_C 6T Z _1 32 6873 1 C 44 7 0 3 oK 75 9_ 6M DS C te C te 62 86 0 oK 33 3_ 4_ _1 i7 1 7 0 a oK 74 3_ C6S S_Z F_ZU 7 oB 74 2_C C3_ Z C te L g i7 1 d_ a oB 74 7_ C6T N_I W O C tete 96 54 8 oK 32 9_C 6T Z 44 7 0 3 oK 75 9_ 6M DS L g inke d_ 2 p oR 32 6_ C6S S ZU JV WJQ C _1 i7 1 7 0 a ke oF R 8_ C6T F_ NF_ O ooK 74 3_ C6S S_Z F_ZU X-l in 0 9 6 5 oR 98 82 C6D P ZU ZU WJQ V C te L g i7 1 d_ a oB 74 7_ C6T N_I W O 1_ Y-l 1 2 3 1 9 2 7 1 oR R 5_ 6_C T SFD OW OW V L g inke d_ 2 p oR 32 6_ C6S S ZU JV WJQ _ 0 H-1 -11 g i_ L g i5 i8 2 4 5 R12 712_ C6T 6S SF _I JQ Q ke X-l in 0 9 6 5 o R 8_ C6T F NF O V CD H L R45908_ 3 9 4 2 4 9 P _U Z 1_ Y-l 1 2 3 1 9 2 7 1 o F9 82 C6D P _Z _Z WJQ oF 67 2_ C6S S _IZ WJQZUO JQV Lg 20 oR R i_ 1 i6 2 9 0 3 V 6_ C5_ MF _Z O WJQ V 0 7 _P CD oR RR 85_ 6_C T SF UOW UOW V oT 98 5_ C6T S UO 97 1_ C6T B E_Z UW _ 0 H-1 -11 g i_ L g i5 i8 2 4 5 L 0_ C7_ 1 5 i_ C6S U W oR L g L g 1 3 P U _IZ O R12 712 C6 6S SF D_I JQ Q oF 97 9_ C6D IZ SNF_ O 7_ C6M S NX U JQ Hsasa_ Lg JQ V L g g i5 C7X U CD H L R45 8_ 3 9 4 2 4 F_I OW UO WJV V 9 i_ oF 81 6_ C6_ 6M F_ZU C6M TM P_Z _U ZW V V oF 67 2_ C6S S _IZ WJQZUO JQV Lg 20 oR R90 i_ 1 i6 2 9 0 3 _ L g L g i7 1 0 6 2 0 7 Lg _P CD 6_ C5 F oC 81 5_ C S H oT 98 5_ C6T S UO F_ W O 97 1_ C6T B E_Z UW Lg i_ 1 i5 1 6 9 2 ZW V W C6S _ _ UO W JQV L 0_ C7_ 1 5 i_ oE 67 3_ 6M W oR L g L g 1 3 SF FX_Z UO UO JQV P oF 97 9_ C6D IZ SN 7_ C6M S NX U JQ Q Hsasa_ Lg Lg i_ 10 0 4 6 8 3 2 U IZ JQ L g g i5 C7X U oT R95 7_C 2_IZ _IZ F_I OW UO WJV V S _Z Z W V JQ V i_ X2_ U W WJQ oF 81 6_ C6_ 6M F_ZU C6M 2 L g L g i7 1 0 6 2 oR 81 C 2X 2X Z Lg V oC 81 5_ C S 48 3 2 H X _I O oC 316_ C 6_C _I Lg i_ 1 i5 1 6 9 2 UO OW V V ZW V W R74 Lg i_ i5 51 1090 oE 67 3_ C6M IZW SF X_Z UO O 6 oI 315_ 88 C3X Z Lg i_ 10 0 4 6 8 3 2 Q W JQ oT R95 7_ 2_ _IZ SF _Z ZU WJQV JQ V X2_ U W WJQ V oI RR 4_ 2 R 6_C 51 26 JQ V oR 81 C 2X 2X Z op R31 C5D4DTS V 2 oR oR 242_ 7_ 12 X _I O oC 316_ C 6_C _I UO OW V V V oR 318_ C IZ UO UOW JQ V 63 10 15 JQ 90 Lg 10 48 827_ 3_ 7_ZV V oI 315_ 88 C3X Z oI 657_ C3_ SF_Z IZ UOW JQ R85 C ZU oR 746_ i_ 10 15 10 W JQ 48 i_ oN 321_ C6 TSF_ _Z oI RR 4_ 5D_I F_ oA 0_ _C6T C6_ 6_ZU6_Z 26 JQ V oI 818_ C6 6DFX FX_Z WJQ WJQ i5 51 op R31 C 4DTS 4_ TP X_IZ IZU W oR 242_ C7_ 51 12 oC 854_ _C V oR 318_ C IZ UO UOW JQ V V JQ 10 R8 oL48R82 853_ C7_ V 7_ C C oH R763 _C6H TS FX_IZ Q oI 657_ C3_ SF_Z IZ UOW JQ C6 BF UO W 2_ C6TS oA UO WJQQ ZU 7_ 7_C C ZV oR R276 _C6H TS UW W i_ oN 321_ C6 TSF_ _Z _Z O W oR R277 _C6H B_IZ UO oA 0_ _C6T C6_ 6_ZU6_Z Lg oA TSGFFX_Z 1_ QV oI 818_ C6 6DFX FX_Z WJQ WJQ W oR R760 C6HT TG_Z WV W 4_ TP X_IZ IZU W _I F_IZ C4 ZU WQVO oC 854_ _C oR 849_ _C6K IZUO Q V JQ X_ UO ZU 85 W oH R763 _C6H TS FX_IZ Q oH R682 C6X_ G_ZO WJV V oR oR C6 BF UO W 2_ C6TS oA UO WJQQ R Lg oA oA84 F_UO QV oR 847_ _C6S oR R276 _C6H TS IZUW W TF X_ V C5 C5 WV O _Z O W oH R571 SGFX _IUOW oR R277 _C6H B_ oA X_ZU _ZU oA TSGFFX_Z 1_ QV oR 2_C6 6TSX ST OW R oR R760 C6HT TG_Z WV W oA 848_ OW C4 ZU WQVO oH oA83 411_ C5_Z _Z oR 849_ _C6K IZUO Q R R oR 4_C6 6DTS X2_IZ JQ UO 9_C5 C5_Z U X_ UO ZU 85 W oH R682 C6X_ G_ZO oR oL48 83_C 6TSF OWJQ oA oA84 F_UO QV opA8 _Z UO 85 oL oR 847_ _C6S oD 846_ C5 oRR6 31_C 6TSF_ZUWJQV TF X_ V 40_CUOW C5 C5 WV R4 TS E_ZO UO oRR7 21_C TS_Z UOWJ 85 C4 WJV QV V oH R571 SGFX _IUOW W _C6TS5TFX _ZUO V oA X_ZU _ZU 83 55 oRR5 75_C6 TSF_Z OWJQV 4_ oR 852_C6 6TSX OW DFX__ZUOWVW oRR1 69_C6 TS_ZU JQV oA 848_ OW 4__ JQ 47_C _ZU FX4_ OWJQ IZU oH oRR5 19_C6 ZUOW QV oR oA83 411_ C5_Z _Z R8 KTSF 438_C WQV C4 oR 4_C6 6DTS X2_IZ oRR5 C6T_I UOWJ oD439 UOWQ U 9_ C5_Z U oP0 218_C6 V oL48 83_C 6TSF OWJQ _C4T_ V UO oRR OW ZV oR 5_Z oD 846_ C5 oRR6 31_C 6TSF_ZUWJQV oRR W oD440 855_C 2_C4 0_ R4 TS E_ZO UO oD437_ oRR837 _C3_Z oRR7 21_C TS_Z UOWJ oRR 85 C4 WJV QV V OW V V oA oRR027C6TSF_ _C4_Z _C6TS5TFX _ZUO V 843_ 841_ oRR 83 55 oRR5 75_C6 TSF_Z OWJQV SF_ZUO QV 4_ oRR172 C5TS_IZ DFX__ZUOWVW oA827_C6AS_ZUOW WV 93_ oA84 A840_ZUO ZO oRR1 69_C6 TS_ZU JQV ST oRR549_ C6TS_Z WJQV oD440 855_C 2_C4 _C4_ 84 JQ oA 443_ oA826_C66S_ZUOW JQV 47_C _ZU FX4_ OWJQ C6 oRR169_ 6T_IZUO UOWJQV oRR5 19_C6 ZUOW QV oA833_C6 T_ZUOWJJQV oM055_C TKDF_IZ JQV KTSF 438_C WQV QV oM061_C6 HTS_IZUW 727_C6 BX_IZU WJQV oRR5 C6T_I UOWJ WJQV oN658_C6 _ZU oD439 UOWQ U ZOW oN659_C6 T_IZUOWJQV 799_C6 TF_ZUO JQV oP0 218_C6 oT078_C5 5_IZUOWJQ V _ZUO JQ oA84 pA V oRR _C4T_ZVV opRR161_C 117_C6 T_IZOW JQV oA810_C6TF _C6_ZUOW oRR opT761_C3_OJQV oR 5_Z WJ oRR OWJQV T_IZUOWJQ WJQV 550 oD437_ oRR837 _C3_Z JQ oD X_Z oRR oRR730_C6TX_ZUOWJQV OW oM065_C6TFX_IZUOWJQ oA oR SF_ZOW _Z o 85 843_ 841_ oRR op C5 _UOWJQ QV oRR172 C5TS_IZ oA811_C6TSF_ZUWJQ oA827_C6AS_ZUOW WV 93_ ZO oRR549_ C6TS_Z WJQV oA 443_ oA826_C66S_ZUOW JQV C6 oRR169_ 6T_IZUO UOWJQV _C6 oA819_C6SX_IZUOW oA833_C6 T_ZUOWJJQV oD _C4 _ZUOW oM055_C TKDF_IZ oRR QV TX_IZUO oM061_C6 HTS_IZUW T_Z 727_C6 BX_IZU WJQV UO WJQV oN658_C6 _ZU oN659_C6 T_IZUOWJQV 799_C6 TF_ZUO JQV T_IUW OWJQV oT078_C5 5_IZUOWJQ _ZUO JQ V opRR161_C _ZUJQV 117_C6 T_IZOW JQV oA810_C6TF _C6_ZUOW ZUO oRR783_C6T ZOWJQV opT761_C3_OJQV oA836_C6TX_ ZOWV WJ opB320 oA820_C6TS_ZUO OWJQV T_IZUOWJQ oA838_C oA838 V WJQV 550 IZW V oA837_C6TSX_ZUO Q oD opT761_C2DTS_OWJQ X_Z oA811_C6TSF_ZUWJQV opB320_C6TS_ZOWV OWV oRR730_C6TX_ZUOWJQV _C6 JQ oM065_C6TFX_IZUOWJQ oA816_C6HTBFDX_ZUOWJQV oRR261_C6TX_IZUOWJQ opT235-12_C5_U0JQ oA835_C6_ZUO oA821_C6TD_ZUOWJQV oA822_C6T_ZUOWJQV oRR804_C6_IZUOWV 85 opA824_C5TF_U oA819_C6SX_IZUOWJ _ 53_C6 _C6 oD oA828_C _C6SF_ oA _ZUOW opT235-14_C5_U0 WJV V QV TX_IZUO T_Z oA832 V JQV T_IUW OWJQV V oP088_C6 oA828_C _ UOW UOW oA820_C6TS_ZUO QV IZW C6T _C6TS_ C6TS_IZ opA824_C5TF_U oA8 6TS_IZO 53_C6 oA QV oA832 oRR027 JQV JQV V oP088_C6 UOW UOW QV V oA8 JQV JQV WV V 50. with the exception of Ocbimv220039316m.B4_1 V ma.1 ph iPC 90 . All rights reserved . a. As seen in b. Order of the genes in the heat maps b. Phylogenetic tree highlighting Scaffold Over three-quarters of the protocadherins are highly expressed throughout 9600 protocadherins in grey bars. d. bars.0 Extended Data Figure 4 | Protocadherin genes within a genomic cluster are Scaffold 9600. while the others show more mixed distributions. protocadherins of the same central brain. Phylogenetic tree highlighting Scaffold 30672 protocadherins in grey (a. c) follows the ordering on the corresponding scaffold. 1 Hsa_PCD H_alpha-11_ 1 Hsa_PCDH 10_1 Hsa_PCD H_alpha-10_11 Hsa_PCDH_alpha-12_ H_al 1 Hsa_PCDH pha-12_1 Hsa_PCDH__alpha-1_1 Hsa_PCDH_ alpha-2_1 Hsa_PCDH__alpha-1_1 alpha-13_1 Hsa_PCDH_ alpha-2_1 Hsa_PCDH_a alpha-13_1 Hsa_PCDH_alph lpha-3_1 Hsa_PCDH_a Hsa_PCDH_alp lpha-3_1 Hsa_PCDH_alpha. RESEARCH LETTER a c 0810m 39309m 0811m 0816m 39310m 0819m 0820m 39311m 0821m 0822m 39312m 0824m 0826m 39316m 0827m 0828m 39317m Sca old 30672 Sca old 9600 0830m 39318m 0832m 0833m 39320m 0835m 0836m 39322m 0837m 0838m 39323m 0839m 0840m 39324m 0841m 0842m 39326m 0843m 39327m 0844m 0846m 39328m 0848m 0851m 39329m 0852m 0853m 39332m 0854m 0855m 39333m Testes Sub St15 Retina OL PSG Supra ANC Viscera Ova Skin Testes Sub Suckers St15 Retina OL PSG Supra ANC Viscera Ova Skin Suckers 3 2 1 0 1 2 3 Row Z-Score b d s_ pc s_ pc dh 11 Ct e_ dh 11 Ct e_ _X lin Ct e_ 13 95 _X lin ke d_ Ct e_ 13 95 13 96 22 ke d_ 13 96 22 XP 09 XP _0 02 _0 02 09 Am 74 13 Am 74 13 ph iPC 90 . which is most protocadherin genes located on Scaffold 30672 in 12 octopus transcriptomes. oRR576_ 6A_ZUOWOWV Hsa_PC DH_gam A11_1 oRR C3_Z Hsa_PCDDH_gamma. Expression profile of the 31 nervous tissues. c. 026 oT096_ C2_Z Hsa_PCDH_gam _1 oG808_ _C3A_Z Hsa_PC DH_gam A11_1 ma. OL and ANC. a-4_1 Hsa_PCDH_alpha-97_1 Hsa_PCDH_alphha-4_1 _2 Hsa_PCDH_alpha-9 a-7_1 Hsa_PCDH_alpha-9_1 _2 Hsa_PCDH_beta-13 Hsa_PCDH_alpha-9_1 Hsa_PCDH_beta-8 Hsa_PCDH_beta-13 Hsa_PCDH_beta-15 Hsa_PCDH_beta-8 Hsa_PCDH_beta-6 Hsa_PCDH_beta-15 Hsa_PCDH_beta-5 Hsa_PCDH_beta-6 oE098_C4MR_IZV Hsa_PCDH_beta-5 Hsa_PCDH_beta-4 oE098_C4MR_IZV oRR092_C2_IZ Hsa_PCDH_beta-3 Hsa_PCDH_beta-4 oRR444_C2Y_IZ oRR092_C2_IZ Hsa_PCDH_beta-2 opRR671_C3A_IZ Hsa_PCDH_beta-3 oRR444_C2Y_IZ Hsa_PCDH_beta-7 oC831_C2X_IZ Hsa_PCDH_beta-2 opRR671_C3A_IZ Hsa_PCDH_beta-10 oC826_C2Y_IZ Hsa_PCDH_beta-7 oC831_C2X_IZ Hsa_PCDH_beta-9 oRR894_C2DY_IZ Hsa_PCDH_beta-10 oC826_C2Y_IZ Hsa_PCDH_beta-141 oT039_C3X2_IZ Hsa_PCDH_beta-9 oRR894_C2DY_IZ Hsa_PCDH_beta-1 a-12 oC832_C3_IZ Hsa_PCDH_beta-14 -11 oT039_C3X2_IZ oE097_oT898_ Hsa_PCDH_betC5_1 Hsa_PCDH_betaa-12 oC832_C3_IZ amma.1 Am DH Am Hs ph iPC 2 Hs ph iPC DH 2 Hsa_Pa_PCD DH 1 Hsa_Pa_PCDHDH 1 Hsa _P H-1 _PC CDH-1 2 Hsa_P _PCD -12 Hsa _PC DH-19 7 Hsa CD H-17 Hsa_PC Hsa DH _a _PC H-19_a _PC -19_c Hsa_PCHsa_PC DH-19 Hsa_PC DH_alp DH-10_1 Hsa_PC DH_alp DH-10_1 _c ha- Hsa_PCDH_alph C2_1 DH_ ha-C2_ a-C Hsa_PC alpha-C 1 Hsa_PC DH_alph 1_1 a-5_ Hsa_PC DH_alph 1_1 Hsa_PC DH_alph 1 a-5_ a-6_1 Hsa_PC DH_alph 1 Hsa_PCD DH_alph a-6_1 a-8_ Hsa_PCD DH_alph Hsa_PCD H_alpha-11_ 1 a-8_ H_alpha. Almost all of these protocadherins are most highly expressed in similar in sequence and sites of expression. highly expressed in the St15 sample. Expression profile of the 17 protocadherin genes located on G2015 Macmillan Publishers Limited.

These four cysteines are also present to varying degrees in Lottia. All rights reserved . c. 23 members. b. The 27 genes found in our transcriptomes have strong of the mammalian IL17 sequences. 4 members. scaffold B (Obi_B). The scaffold C genes are enriched in the PSG G2015 Macmillan Publishers Limited. The human IL17 IL1A. Blank rows indicate genes not expressed in our first cysteine residue is found in all invertebrate sequences examined. scaffolds D (Obi_D) and E (Obi_E). Two additional conserved cysteine profile of 31 octopus IL17-like genes. IL1B. Conserved cysteine a. scaffold C (Obi_C). Phylogenetic tree of interleukin genes in Obi. Heat map rows are arranged by order residues were found in the octopus sequences and are highlighted in red. 2 residues. 1 . and Lgi. members. Mammalian residues in human IL17 and invertebrate IL17-like proteins. Expression Capitella and Crassostrea sequences. Human and mouse IL17s branch proteins share a conserved cysteine motif comprising 4 cysteine residues. ‘IL17-like’. Cte. Octopus invertebrate ILs) group with the mammalian IL17 branch and are named IL17-like proteins also contain this four-cysteine motif. highlighted in yellow. 1 member each. expression in the suckers and skin. The on each scaffold. Cgi. and IL7 used as outgroups. LETTER RESEARCH a b A902 Mmu_AAX90603-1 A903 Hsa_AAA59134-1 A904 Mmu_EDL28238-1 Mammalian A905 Hsa_AAA74137-1 IL1 . which from other members of the IL family. Octopus ILs (as well as all identified may form interchain disulfide bonds and facilitate dimerization56. & 7 A906 Mmu_AAI10554-1 A907 A908 Hsa_AAH47698-1 A910 Mmu_NM_145837 A911 Hsa_AAH36243-1 A912 Mmu_EDL09761-1 A914 Hsa_AAF28104-1 A915 Mmu_AAK59816_1 A918 Hsa_AAG40848-1 Mammalian A919 A920 Mmu_EDL11677_1 IL17 A921 Hsa_AAF28105_1 A922 Mmu_EDL14378-1 A923 Hsa_AAH67505-1 A924 Mmu_AAQ88439-1 A925 Hsa_AAH70124-1 A927 Cgi_KJ531893_1 A928 Cgi_KJ531897_1 A929 B403 Lgi_172928 B404 Cgi_KJ531894_1 B405 Cgi_KJ531895_1 B406 Lgi_152638 C804 Lgi_152641 C805 Lgi_152639 D104 Lgi_176347 Annelid & non. and none transcriptomes. The 31 octopus genes are distributed across 5 scaffolds: scaffold A One octopus sequence encodes only 3 of these highly conserved cysteine (Obi_A). E183 cephalopod PSG Retina Supra St15 Sub ANC Testes Skin OL Ova Suckers Viscera Lgi_228210 Cte_199819 mollusc Cte_207036 IL17-like Cgi_KJ531896_1 Cte_209751 3 2 1 0 1 2 3 Cte_226557 Row Z-score Cte_209750 c Cte_209765 Cte_210775 Cgi_ABO93467-1 oIL17L_E183 oIL17L_A908 oIL17L_A910 oIL17L_A907 oIL17L_B404 oIL17L_A906 oIL17L_A904 Octopus IL17-like oIL17L_D104 oIL17L_A928 oIL17L_A927 oIL17L_A911 oIL17L_C805 oIL17L_C804 oIL17L_B406 oIL17L_B405 Octopus oIL17L_A903 IL17-like oIL17L_A905 A oIL17L_A902 B oIL17L_B403 Human C IL17 D oIL17L_A929 E F oIL17L_A922 oIL17L_A924 oIL17L_A925 oIL17L_A923 Lottia IL17-like oIL17L_A921 oIL17L_A912 oIL17L_A915 oIL17L_A919 Capitella oIL17L_A918 IL17-like oIL17L_A914 oIL17L_A920 Crassostrea IL17 Extended Data Figure 5 | Expansion of interleukin 17 (IL17)-like genes. and the Scaffold D gene is enriched in the viscera.

We identified 4 opsins in the octopus genome (from superfamily that activates intracellular second messenger systems upon ligand top to bottom): rhodopsin.5. binding. rhabdomeric opsin. GPCRs. c. a. All rights reserved . The octopus class F GPCRs comprises 6 genes: 5 Frizzled genes and 1 O. octopus genome contains chemosensory-like GPCRs. Thirty octopus genes show similarity to vertebrate mentary Note 8. peropsin. As reported for other lophotrochozoan genomes. Smoothened gene (*). 74 GPCRs are similar to G2015 Macmillan Publishers Limited. e. also the Aplysia chemosensory GPCRs57 and 11 GPCRs are similar to vertebrate known as 7-transmembrane (7TM) or serpentine receptors. The full complement of GPCRs is presented in Supple. and retinochrome. RESEARCH LETTER a b Vertebrate-like Chemosensory c * Opsin † d Frizzled ‡ e Aplysia-like Chemosensory Adhesion PSG Sub Supra Ova Testes Retina Viscera Suckers OL Skin St15 ANC 3 2 1 0 1 2 3 Row Z-score Sub Supra ANC PSG Testes Ova Viscera Suckers Skin St15 OL Retina Extended Data Figure 6 | G-protein-coupled receptors. bimaculoides. form a large olfactory receptors. b. This figure considers a subset of the 329 GPCRs we identified in d. the adhesion GPCRs.

S R N V V K Y D C C P Q I Y L Obi_12266 CNSVSGDFSFDVDKEVTVKYDGFVHLHIDKIFKTYCRINVENYPFDQHECDITVCLEHQMYMEETIEDF---VIDVKLKTKSNQWNFSFEET-EMEKDD-------VI Obi_12265 CNSVTGKFSFDDNKEVTVNRNGDVNLYIDKIFETYCRINVEKYPFDEHECDISVCFEHQMYVEETVGEF---DYEVKLQSASNQWDFNFEKS-DVENDN-------IV Obi_12263+ CNGVMDRFKLDEDTEIFLTNEGTVFLYIDNVFQTYCRINVNKYPFDEHECDLLVCLNHQMRERKRKPSK--------------------------------------- Obi_29097 CNSASGKFTFDEDTGVTLTSNGNTSLYIDRIVNTYCKVNINKYPFDEHECDISVCFRHQINTEETLNNF---VYNVTYNPTYNQWEYTFKEK-DILKEG-------II Obi_29099 CNSVTGKFTFDGNWGVTIKSDGSVHLHIDQIFHTYCKVNVNKYPFDEHECDISVCFEHQMNLEVMLHDF---MYRVTYKPISNQWDYRYEYR-EVEKEE-------II Obi_12259 CNDMSGNFAVHK-GGATIEYDGTVTFHMDGIFQTYCTIDMHKYPFDEHECYIKSCLRHQKYKEQTIQNF---SFYNMYNSSSDTWDYKFVVG-DVMENG-------II Obi_04961 CNDMSGKFSQHEGEGATVKYDGSVSLHMDGIFQTHCTINMLKYPLDEHECNITVCLGHQENIEKTMQSF---SFNNLHNAEADKWEYKFAVG-NVTEKE-------II Obi_18127 CNSVEGKFKFDEDKQVSVRHDGIVNLNTEGIFNTYCEINMENYPFDEHIC---------------------------------------------------------- Obi_18129+ RNSAEDKFIFQKNKQVFIKYDGTINLHIDGTYRIYCRIDIDKYPFDEHICYLSICLGTEMENQETIQFQ--------------EWEFRLEKT-SEE----------NF Obi_12260 SNAEKVTKIPSLSEYITVSYDGRTSYFIRSIYRTYCSIDFYKYPFSLNVCKIYFWLSNEMVSYLKLQNV---DLANTSNIWTTIWNIQLDGH-KYDDDNS------TD Obi_12262 CNAETIYNVSTPIPEAVMLSNGTIKTSTTLVYTIHCKIDNTKYPFDKQACEVHICLPLSKLNNVRIKTI----TTFKKQVTLRNWNVEIDKV-LQNHNER------IF Obi_12261 CNAIKLIGNHRYERQVTVWQNGVVEEESFYTYQLFCGVDNSRYPFDVQNCPTYICLPYQMNNLTLIKSL---RTDPVKNMEG--WHIHTSTE-PPITYND------QE Obi_30094 FVNGLSAVESAAEPAIRLEYSGNLNKYQKLSLKTFCPTEKDQYSFS---CPFMLKTYPLPSTQERLRVT---DFEVNEKFQSHQWNAEVNTN-ETRIYNED------- Obi_30842 CNSVNEQDDSNINREVYVHYNGTVELWSLKYIETYCQVNAYTYPFDDQKCKIQMCVGLHSPDETRLKTI---CYWNMKFTESYKWDIHFSGK-ANGINSQ------SS Obi_30840 CNSVNEQGDSNINREVYVDYEGTVYLWSLKYIETYCQVNAYTYPFDVHYCGIEMCVPLHSPNETRIQTI---YYRNMNFTENYKWDIHFSGE-ANGKVEE------FS Obi_30843 CNSMTQSEEKDSLDDVLIYYDGFVRMLSFTLLQTYCQVNAYSYPFDEHKCEIRMCSATYHTDEANVTSF---LLNVYSEEENYKWYMSISDQ-ETY----------SS Obi_06517 CNSMENSEDKDDFPELWIFSNGHVVMYSFRLLNTYCEVNAYTYPFDKHMCEIYMCVALHSVQHTRIKTL---DYHELNFIQNYKWDITLEGT-VNATNDK------FN Obi_06518 CNSMDKSEENDGVGELMLTYTGWINMWSFRLLHTYCQINAYTYPFDEHTCEIYLCVALHTINHTRIKEL---IYEDSKFTQNYKWDINVSGK-VNGTDEL------FS Obi_15560 CNAMKESEEKGSFLEVKVFNNGRVQMRSLKLLKSYCTFDAYAYPFDQHDCEIYICVALHDPVHTRIRTL---TYDNLNYSPNYNWDIDYNGI-KNASDQR------FS Obi_06519 CNSMKESDDEDNFPEVRIFNNGLVERWSLPLLQSYCEVNAYAYPFDEHICKIYMCIALHTPQHTQINTL---IYYDADHTQNYKWNVNISGE-MKGIKFS------FS Obi_06522 CNTMKQSEDKDNPSEVSVYFNGSIEISLIKLLHTYCEINAYTFPFDEHTCNVSMCVSLQELHHAKRTKL---TYKS-RQAKHSKWDIKFSGG-TNGTNYYH-----YS Obi_06521 FNSRTESKYKYSYQDVTVYSKGSVEMVSIRFLHTDCQIEAYIYPFDLQTCYIFLGIPTYKPQDTKIKEI---LCGKENDTTNYQWDITLYCN-VDSANKH------YN Obi_06520 LNTLMETQSKNNFLEMTVDFNGSVTMVEIKLLQTFCEIYVYNYPFDAQTCVISMGIPSHKFQDTKIKEL---SCYRKSDISDSEWGISFSCN-VHGTNNS------FS Extended Data Figure 7 | O.N S D D S E Y F S Q Y S R F E I L D V T Q K K N S V T Y S C C P . stagnalis ACh-binding protein (Lst_AchBP). In a and b. Obi_106971).A I S K P E V L T P Q L A R V V S D G E V L Y M P S I R Q R F S C D V S G V D T E S G . red asterisks residues shared between the alpha-7 subunits are shaded in light grey. Divergent octopus subunits lack nearly all residues necessary for ACh subunits. with ACh-binding subunits are highly expressed in the suckers. was not detected in our transcriptome data sets. LETTER RESEARCH Alpha 7 Beta 1 Alpha Obi_22010697m+ Alpha 1-4 Obi_22028723m a 9/10 Obi_2 2017 1 248 9505 Dme_030160 Cte_221893 74 8m Cte_141057 b like Cte_16 5266 Lgi_136425 CHa7 Lgi_ 1412 Mmu_A Dme_ 0089 Mm u_ Hsa_AC Hsa_A Ha7 Dme _007 6 Alpha 1-4 5 * 12 01 73 15 20 00 6 Ct e_ 95 Ct e_ 88 42 Mmu_A 5 Hsa_AC 732m 14 30 89 39 e_ 03 4 0 6 1 AC Ha CHa10 Ct e_ CHa9 like 2 Beta 1 Lg i_1 22 78 Ha9 e_ 00 Ob i_2 C te C te 78 03 34 21 54 e_ 02 3 C te Ct e_ 02 e_ 02 08 23 82 _2 37 _1 12 10 21 84 C te Dm e_0 _1 35 79 C te 17 Dm 48 Alpha 7 00 C te 43 Dm _6 5 41 31 3 _1 Dm 19 4 e_ C te 1 6 7 1 52 1 Dm 50 48 C te _1 35 _9 C te 5 Dm 14 C te 2 2 1 94 40 C te _6 5 _ m _1 52 C te 67 55 C te 5 28 51 _ Lg 3 23 52 _1 84 9 _8 96 07 i_ Lg 87 _1 2 21 90 00 21 i_ 08 3 Alpha 9/10 C te 95 98 Lg 19 O 22 i_ 30 3 b i_ i_ 47 Lg 68 14 i_ Lg C te _ 1 4 O 7m 64 Ob 22 2 60 34 b i_ i_ 13 60 37 C te m 73 02 _1 O 22 b i_ 1 8 6 50 22 032 b i_ 3 29 97 m 00 48 C 22 48 72 6 i_ b i_ 2 2 te 73 03 48 Lg 03 _2 m 64 68 03 21 Lg a4 m O 79 22 69 C i_ H O 10 C Mammalian O m b i_ 6 b i_ te a4 _1 31 + O u _A H 22 11 2 0 0 592 M m AC Ha a_ Alpha Ob C te Lg i_ 93 21 05m 44 Hs m u_ AC Ha 2 3 Alpha 1-6 i_ 07 M AC Ha 22 74 _2 0304 30 10 Hs a_ u_ AC Ha 3 Beta 1-4 9 m Lg 12 M AC Ha 6 Ob C te 21 m i_ 5 + H s a_ u_ AC 6 Delta i_ 2 _6 59 Mm Ha AC 20 24 87 25 98 Hs a_ AC Ha 5 Gamma Lg 9m m u_ a5 Ob C te i_ 5 24 81 + M _A CH Hb 3 Epsilon i_ 2 20 35 _5 25 45 Hsa u_ AC Alpha 82 Mm CH b3 Lg i_ 6 m _A 10 25 + Hsa Ha1 C te 97 _AC A1 O bi _2 28 Hsa HR _2 20 74 5 _AC A1 06 96 Hsa HR 4m u_AC Lg i_ 15 22 + Mm G 89 ACHR Ct e_ Hsa_ G 52 94 ACHR Ct e_ 0 Mmu_ E Ob i_2 20 37 20 12 40 5m 63 Hs a_ACHR ACHR E * Mmu_ Ob i_2 Lg i_1 52 29 0 Hsa_AC HRD * 200 091 1m + Mmu_A CHRD * Lgi_ 523 HRB1 * Cte_ 1991 85 33 Hsa_AC Mmu_AC HRB1 Beta 3 Lgi_1 6244 1 Mmu_ ACHb 2 Non-Alpha Dme_0 077724 Hsa_ACHb2 Lgi_12821 2 Mmu_ACH b4 Lgi_168269 Hsa_ACHb4 Obi_22034659m Obi_22012266m Obi_22006184m+ Obi_22012265m Obi_2200 6182m Obi_2201 2263m+ m Obi_22 029097 Obi_22 034660 m 6 Obi_2 2029 Cte_1 7608 099m 98 Obi_ 2201 Cte_ 2240 2259 m 483 Obi _22 Cte _24 004 961 m 823 3 Ob i_2 201 812 Cte _21 7m 2 Ob i_2 99 41 20 18 Cte _1 Ob i_2 12 9m 92 + 21 39 20 12 Ct e_ Ob i_2 26 0m 33 5m 20 12 20 27 Ob i_ 26 2m Ob i_2 58 e_ 21 45 22 01 22 61 Ct 2 O bi 26 01 _2 20 m Ct e_ O bi 30 09 C te _5 29 79 12 O bi _2 20 30 84 4m Putative _2 20 2m _9 01 2 C te 06 O bi _2 20 30 84 0m Non-binding _2 04 1 Ob 30 84 C te _1 14 50 21 Ob i_ 2 20 06 3m Putative C te 54 i_ 2 51 7m 11 Ob 20 C te _ _1 82 08 3 3 Ob i_ 2 20 06 51 8m Non- 72 i_ 2 15 C te 56 C te _2 07 29 28 Ob Ob i_ 2 20 06 51 0m binding _5 92 i_ 2 20 06 9m C te 31 Ob 52 _9 43 i_ 20 06 2m C te 28 Lg 22 52 _9 79 8 i_ 00 1m C te 05 + Lg 67 95 65 20 _1 6m Lg i_ 16 8 m Alpha C te 4 5 5 7 5 2 01 2 01 Lg i_ 96 90 51 22 23 L g i_ 1 6 32 i_ i_ 19 + 5 b Lg 16 4 m L g i_ 9 22 O Lg i_ 4 91 6 11 * 03 36 88 i_ 56 295 L g i_ 6 0 21 63 10 * 2 2 te _ i_ Lg 48 1 87 14 863 b i_ * 29 C _1 + Lg 69 m O _9 te 4 64 i_ Lg 29 97 * C te 13 i_ 16 81 C Lg 97 57 ** 54 Ova Testes Viscera PSG Suckers Skin St15 Retina OL Supra Sub ANC 13 01 i_ 16 Lg 95 69 38 11