Open navigation menu

Welcome to Scribd!

The Transformer Architecture

Uploaded by

alexandre albalustro

0% found this document useful (0 votes)

7 views9 pages

The Transformer Architecture

Copyright

© © All Rights Reserved

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

The Transformer Architecture

Copyright:

© All Rights Reserved

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

7 views9 pages

The Transformer Architecture

Uploaded by

alexandre albalustro

The Transformer Architecture

Copyright:

© All Rights Reserved

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 9

Search inside document

The Transformer Architecture TheAiEdge.

io

Decoder
'you'
Encoder
Encoder output
Predicting head

Encoder Decoder
block block
Encoder Decoder
block block
Encoder Decoder
block block
Token Token
embedding embedding

Position Position
embedding embedding
'how' 'are' 'you' 'doing' '?' [SOS] 'I' 'am' 'good' 'and'

Input Output
sequence sequence

The Overall Architecture

The Transformer Architecture TheAiEdge.io

even i
odd i

The Position Embedding

The Transformer Architecture TheAiEdge.io

Encoder block

Multihead Layer Feed Layer

Attention Normalization Forward Normalization
Layer network

The Encoder Block

The Transformer Architecture TheAiEdge.io

Keys
Self-attentions

Wk
Hidden Queries Softmax

states
Wq

Wv

Values

Hidden
states

The Self-Attention Layer

The Transformer Architecture TheAiEdge.io

Hidden
state

Layer
Normalization

The Layer Normalization

The Transformer Architecture TheAiEdge.io

dmodel
dff
dff

Linear layer dmodel

Linear layer

The Position-wise Feed-forward Network

The Transformer Architecture TheAiEdge.io

Encoder
output

Decoder block

Hidden
states

Cross Feed Layer

Attention Forward
Multihead Layer Normalization
Layer Layer
Attention Normalization network
Normalization
Layer

The Decoder Block

The Transformer Architecture TheAiEdge.io

Keys
Cross-attentions
Encoder
Wk
output
Queries Softmax

Wq

Wv

Values

Hidden
Decoder states
hidden
states

The Cross-Attention Layer

The Transformer Architecture TheAiEdge.io

‘How’

‘are’

‘you’ Encoder
‘doing’

‘?’ dmodel Vocabulary

Decoder size
Vocabulary size
hidden
[SOS]
states Sequence
‘I’ size

‘am’ Decoder
‘good’

‘and’ ArgMax
predictions
Linear layer

‘you’

The Predicting Head

You might also like

IC Logos 02
Document7 pages
IC Logos 02
nassim
75% (4)
CoreJava Ratan CompleteMarerial PDF
Document398 pages
CoreJava Ratan CompleteMarerial PDF
SivaShankar
100% (7)
SQL Considered Harmful
Document46 pages
SQL Considered Harmful
Robert Guiscard
No ratings yet
The Road To ChatGPT An Informal Explainer 1683000313
Document104 pages
The Road To ChatGPT An Informal Explainer 1683000313
aparna1407
No ratings yet
Autoencoder: Tuan Nguyen - AI4E
Document35 pages
Autoencoder: Tuan Nguyen - AI4E
Huy Hoàng
No ratings yet
Autoencoder: Tuan Nguyen - AI4E
Document35 pages
Autoencoder: Tuan Nguyen - AI4E
Huy Hoàng
No ratings yet
Auto (v3)
Document27 pages
Auto (v3)
daredevilcho
No ratings yet
Talk MLSS Part2
Document97 pages
Talk MLSS Part2
Neetha Mary
No ratings yet
Lecture11 Memoryi
Document48 pages
Lecture11 Memoryi
조동올
No ratings yet
How WebKit Works
Document16 pages
How WebKit Works
kr.pan199
No ratings yet
Digital Integrated Circuits: A Design Perspective
Document107 pages
Digital Integrated Circuits: A Design Perspective
Ankit Shah
No ratings yet
Variational Autoencoder For Image Generation: Shusen Wang
Document71 pages
Variational Autoencoder For Image Generation: Shusen Wang
MInh Thanh
No ratings yet
Chapter 12
Document107 pages
Chapter 12
arsalan.jawed
No ratings yet
D1 - Hiding in Plain Sight - Analyzing Recent Evolutions in Malware Loaders - Holger Unterbrink & Edmund Brumaghin PDF
Document61 pages
D1 - Hiding in Plain Sight - Analyzing Recent Evolutions in Malware Loaders - Holger Unterbrink & Edmund Brumaghin PDF
momoomoz
No ratings yet
Pub - The Tomes of Delphi 3 Win32 Graphical Api
Document928 pages
Pub - The Tomes of Delphi 3 Win32 Graphical Api
franssinamo
No ratings yet
Chapter 12
Document107 pages
Chapter 12
Gautham Anebajagane
No ratings yet
OTV Cheatsheet V1.00 PDF
Document2 pages
OTV Cheatsheet V1.00 PDF
Oliver Asturiano Abdalá
No ratings yet
Tutorial: Learn by Example
Document8 pages
Tutorial: Learn by Example
Praveen Kumar
100% (1)
VHDL Aes 128 Encryption/Decryption: Bradley University Department of Electrical and Computer Engineering
Document34 pages
VHDL Aes 128 Encryption/Decryption: Bradley University Department of Electrical and Computer Engineering
ROo GAa
No ratings yet
PTE Overview Slides
Document12 pages
PTE Overview Slides
nntvnn
No ratings yet
Jpeg 2000
Document15 pages
Jpeg 2000
Arthinathan Saam
No ratings yet
How To Do Revit To IFC Properly
Document83 pages
How To Do Revit To IFC Properly
Wanderson
No ratings yet
Lect 05 PDF
Document29 pages
Lect 05 PDF
Vijendra Pandey
No ratings yet
16.548 Notes 15:: Concatenated Codes, Turbo Codes and Iterative Processing
Document84 pages
16.548 Notes 15:: Concatenated Codes, Turbo Codes and Iterative Processing
Rana
No ratings yet
11 Rootkit Techniques
Document31 pages
11 Rootkit Techniques
Stiiven Rodriguez Alvarez
No ratings yet
CPU Vs GPU Architectures PDF
Document87 pages
CPU Vs GPU Architectures PDF
Mrunal Solkar
No ratings yet
How A GPU Works: Kayvon Fatahalian 15-462 (Fall 2011)
Document87 pages
How A GPU Works: Kayvon Fatahalian 15-462 (Fall 2011)
Michaele Ermias
No ratings yet
Short Form
Document44 pages
Short Form
Intisar reza Abir
No ratings yet
Altium VS Circuit Studio Comparison
Document3 pages
Altium VS Circuit Studio Comparison
giordanobi859641
No ratings yet
Design Rule Checks (DRC) - A Practical View For 28nm Technology
Document5 pages
Design Rule Checks (DRC) - A Practical View For 28nm Technology
Naga Nithesh
No ratings yet
Cell-Level Digital Design Flow: Synopsys University Courseware Chip Design Lecture - 4 Developed By: Vazgen Melikyan
Document47 pages
Cell-Level Digital Design Flow: Synopsys University Courseware Chip Design Lecture - 4 Developed By: Vazgen Melikyan
Kiệt Phạm
No ratings yet
An Introduction To DFT - Bridging & Switch Level Faults
Document24 pages
An Introduction To DFT - Bridging & Switch Level Faults
Naga Nithesh
No ratings yet
Combinational-Circuit Building Blocks: 7.1. Signal Names
Document8 pages
Combinational-Circuit Building Blocks: 7.1. Signal Names
Ma Seenivasan
No ratings yet
E There Um For Beginners
Document29 pages
E There Um For Beginners
Jackdanilo
No ratings yet
Bascom-Avr Manual 1.11.7
Document369 pages
Bascom-Avr Manual 1.11.7
kailash96
No ratings yet
Day 1 S2
Document24 pages
Day 1 S2
Shailaja Udtewar
No ratings yet
Core Java Complete Marerial
Document448 pages
Core Java Complete Marerial
bvcecman2
No ratings yet
Cisco Rop
Document48 pages
Cisco Rop
mzxbcl
No ratings yet
Blockchain Application Development 101: Glenn Jones, COO & Ken Staker, VP Product Management Sweetbridge
Document29 pages
Blockchain Application Development 101: Glenn Jones, COO & Ken Staker, VP Product Management Sweetbridge
Alvaro Lara
No ratings yet
SV For VDL Users
Document68 pages
SV For VDL Users
Arthiha Jeevan
No ratings yet
Tech Talks Discover The Security Features of Secure Vault
Document27 pages
Tech Talks Discover The Security Features of Secure Vault
sanjeev sanjeev
No ratings yet
Programming On AutoCad
Document25 pages
Programming On AutoCad
Raul Andres
No ratings yet
Best Practices Cyber Security Testing
Document4 pages
Best Practices Cyber Security Testing
theeffani
100% (1)
Turbo Codes
Document28 pages
Turbo Codes
Prasant Kumar Barik
100% (1)
Industrial Training PPT C Bash Shell Data Structur Uing Linux
Document22 pages
Industrial Training PPT C Bash Shell Data Structur Uing Linux
Abhishek Singh Tomar
No ratings yet
Functional Patterns With Java8
Document18 pages
Functional Patterns With Java8
victorrentea
No ratings yet
Code Example Ts
Document361 pages
Code Example Ts
Coccol
No ratings yet
Introduction To Verilog
Document155 pages
Introduction To Verilog
kiendlth2
No ratings yet
Audiowatermarking 160126225133
Document24 pages
Audiowatermarking 160126225133
hamed raza
No ratings yet
Deterministic Ethernet
Document56 pages
Deterministic Ethernet
Hissam Hanif
No ratings yet
The UART Project: Applying What We've Learned About Linux Device-Drivers To The PC's Serial-Port Controller
Document43 pages
The UART Project: Applying What We've Learned About Linux Device-Drivers To The PC's Serial-Port Controller
Pradeep Kumar
No ratings yet
Fpga Bases VLSI Design and Implementation: Course Overview
Document4 pages
Fpga Bases VLSI Design and Implementation: Course Overview
Sarthak Kumar
No ratings yet
VHDL Tutorial
Document8 pages
VHDL Tutorial
mnpaliwal020
No ratings yet
Embedded Controller Hardware Design
From Everand
Embedded Controller Hardware Design
Ken Arnold
Rating: 2 out of 5 stars
2/5 (1)
Inside OrCAD
From Everand
Inside OrCAD
Chris Schroeder
Rating: 4 out of 5 stars
4/5 (2)
PC Interfacing Pocket Reference
From Everand
PC Interfacing Pocket Reference
Myke Predko
No ratings yet
Hack Attacks Denied: A Complete Guide to Network Lockdown
From Everand
Hack Attacks Denied: A Complete Guide to Network Lockdown
John Chirillo
Rating: 3.5 out of 5 stars
3.5/5 (3)
Bebop to the Boolean Boogie: An Unconventional Guide to Electronics
From Everand
Bebop to the Boolean Boogie: An Unconventional Guide to Electronics
Clive Maxfield
Rating: 4 out of 5 stars
4/5 (3)
Visual Media Coding and Transmission
From Everand
Visual Media Coding and Transmission
Ahmet Kondoz
No ratings yet
Digital Video and DSP: Instant Access
From Everand
Digital Video and DSP: Instant Access
Keith Jack
No ratings yet
J2EE AntiPatterns
From Everand
J2EE AntiPatterns
Bill Dudney
Rating: 4 out of 5 stars
4/5 (2)
Sentiment Classification Using BERT - GeeksforGeeks
Document14 pages
Sentiment Classification Using BERT - GeeksforGeeks
Ajeet Ranaut
No ratings yet
ICCCI 2021 Paper 204
Document12 pages
ICCCI 2021 Paper 204
Samawel Jaballi
No ratings yet
A Survey On Vision Transformer
Document23 pages
A Survey On Vision Transformer
Lưu Hải
No ratings yet
MAGVIT Masked Generative Video Transformer
Document30 pages
MAGVIT Masked Generative Video Transformer
yym68686
No ratings yet
Chengqing Zong - Rui Xia - Jiajun Zhang - Text Data Mining-Springer Singapore
Document506 pages
Chengqing Zong - Rui Xia - Jiajun Zhang - Text Data Mining-Springer Singapore
ياسر سعد الخزرجي
No ratings yet
Learning Rate Free Learning by D Adaptation
Document35 pages
Learning Rate Free Learning by D Adaptation
Satish Gundawar
No ratings yet
L - R - V C CPU: OW Latency EAL Time Oice Onversion On
Document8 pages
L - R - V C CPU: OW Latency EAL Time Oice Onversion On
Rama Castro
No ratings yet
Machine Learning & Artificial Intelligence: Executive PG Programme in
Document27 pages
Machine Learning & Artificial Intelligence: Executive PG Programme in
rob m
No ratings yet
NLP StudyMaterial
Document540 pages
NLP StudyMaterial
tegeje5009
No ratings yet
Exam ml4nlp1 Hs21.example Solution
Document6 pages
Exam ml4nlp1 Hs21.example Solution
Mara Bucur
No ratings yet
Thesis Report Final BU Cse
Document26 pages
Thesis Report Final BU Cse
Trad
No ratings yet
Wei 2023 Diffusion Model As Mae
Document18 pages
Wei 2023 Diffusion Model As Mae
cuong.huy.ptr.3
No ratings yet
Pyvene:: A Library For Understanding and Improving Pytorch Models Via Interventions
Document8 pages
Pyvene:: A Library For Understanding and Improving Pytorch Models Via Interventions
VINOTH M
No ratings yet
25072-Article Text-29135-1-2-20230626
Document9 pages
25072-Article Text-29135-1-2-20230626
Tejas Pagare
No ratings yet
Llms
Document11 pages
Llms
kamalvidya.2002
No ratings yet
【2023】热点文章 Mamba Linear-Time Sequence Modeling with Selective State Spaces
Document37 pages
【2023】热点文章 Mamba Linear-Time Sequence Modeling with Selective State Spaces
meng775928yu
No ratings yet
Electronics 11 02306 v2 PDF
Document15 pages
Electronics 11 02306 v2 PDF
NaveenKumar
No ratings yet
Focal Self-Attention For Local-Global Interactions in Vision Transformers
Document21 pages
Focal Self-Attention For Local-Global Interactions in Vision Transformers
Mih Vicent
No ratings yet
PHD Title: Efficient Multimodal Vision Transformers For Embedded System
Document4 pages
PHD Title: Efficient Multimodal Vision Transformers For Embedded System
YOUSSEF ELBAAOUI
No ratings yet
@DataScience - Ir - 111 Essential Concepts For Data Scientists
Document14 pages
@DataScience - Ir - 111 Essential Concepts For Data Scientists
Safiullah sarmadi
No ratings yet
Production - Derieux - Cedric - Advances in Automatic Image Restoration and Upscaling
Document4 pages
Production - Derieux - Cedric - Advances in Automatic Image Restoration and Upscaling
derieux.cedric
No ratings yet
Laskar 21
Document32 pages
Laskar 21
Selene Vargas
No ratings yet
2020-Speech Recognition and Multi-Speaker Diarization of Long Conversations
Document5 pages
2020-Speech Recognition and Multi-Speaker Diarization of Long Conversations
Mohammed Nabil
No ratings yet
Serving LLM 2312.15234
Document32 pages
Serving LLM 2312.15234
jozin0z0bazin
No ratings yet
LLM Survey
Document43 pages
LLM Survey
prprbr
No ratings yet
ASR - VLSP 2021: Conformer With Gradient Mask and Stochastic Weight Averaging For Vietnamese Automatic Speech Recognition
Document7 pages
ASR - VLSP 2021: Conformer With Gradient Mask and Stochastic Weight Averaging For Vietnamese Automatic Speech Recognition
RV
No ratings yet
Towards Applying Powerful Large AI Models in Classroom Teaching: Opportunities, Challenges and Prospects
Document16 pages
Towards Applying Powerful Large AI Models in Classroom Teaching: Opportunities, Challenges and Prospects
Alonso
No ratings yet
Text Grouping: A Comprehensive Guide
Document8 pages
Text Grouping: A Comprehensive Guide
IAES IJAI
No ratings yet
Prediction and Sentiment Analysis of Stock Using Machine Learning
Document10 pages
Prediction and Sentiment Analysis of Stock Using Machine Learning
IJRASETPublications
No ratings yet