0% found this document useful (0 votes)

51 views7 pages

RF Diffusion Manual

The RF Diffusion manual outlines the process of protein backbone generation using machine learning algorithms, specifically RF Diffusion, Protein MPNN, and Alphafold2. It details the steps involved in designing proteins, including backbone generation, sequence design, and experimental validation, while emphasizing the algorithm's ability to create new proteins with desired functions. The manual also describes the practical application of these tools in developing a specific nanobody for OXA48, including the setup and execution of the RF Diffusion Google Collab.

Uploaded by

minhnhut.lxag1988

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

51 views7 pages

RF Diffusion Manual

Uploaded by

minhnhut.lxag1988

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

MANUALS RF-

DIFFUSION

SUPERBUGBUSTER
TEAM
RF Diffusion manual
Making a protein backbone, the base structure of a protein, containing the carbone, is a really challenging task. First,
in history, we tried to have the structure of a protein by experiments such as photogrammetry or cryomicroscopy. But
as it is an experiment, those have limits. It is why, in the past few years, a certain number of softwares appeared, to
predict the structure of protein, by modeling it, thanks to informatic.

1. How does it work?

Models like RF Diffusion are useful tools that allow humanity to find proteins that nature didn’t even
develop and that are useful for us because of their modified functions or properties.

But, first, it is important to understand how protein design works.

1- Backbone generation: first it is important to find the basic structure of the protein that can carry
the specific functions wanted or able to bind to a specific protein. It’s where RF Diffusion is
working
2- Sequence design: after, it is needed to find the sequence that could fold up that backbone
structure. This part is done with Protein MPNN
3- Computation filtering: Thus, a selection of the best designs is done. The inverse way is practiced
thanks to other algorithms like Alphafold2: if the sequence that was find doesn’t fold up the
same structure that was designed, it’s set aside. Because having a good fold increase chances of
success
4- Experimental characterization: it is of course necessary to validate the experimentally the
proteins created.

All the first 3 steps are included in the Google Collab linked in the Contribution page.

RF diffusion

This machine learning algorithm uses deep learning methods to find the backbone of proteins. It is based
on a probabilistic diffusion model, that means that it is trained on true data on which a known amount of
noise is added, and the model has to denoise it, step by step. It is called Gaussian noise.

At the end, it can provide a protein starting from pure noise.

Thanks to that, the amount of output it can give is endless.

If you give a sequence of nucleotides it can give you the structure of the protein.
If you already have the structure, it can give you another structure that can be fixed on the first one, like a
protein binder.
It can give you the folding of the active sites.
It can also create new proteins with wanted functions (from known bioactive compounds).
This algorithm has high accuracy and a higher rate of success than previous modeling algorithms while
passing at the experimental confirmation: around 10%.

Protein MPNN

ProteinMPNN is a deep learning-based protein sequence design method that can design new proteins with
high accuracy. It was written by David Baker in 2022. ProteinMPNN is trained on a protein databank
comprising thousands of high-resolution structures.
This model takes as input a protein structure and based on its backbone predicts new sequences that will
fold into that backbone. Optionally, we can run AlphaFold2 on the predicted sequence to check whether
the predicted sequences adopt the same backbone.

Alphafold

Alphafold2 is an artificial intelligence algorithm developed to predict the three-dimensional structure of

proteins. It was developed by DeepMind and announced in 2020. Alphafold2 can predict the
three-dimensional structure of a protein from its amino acid sequence with high accuracy.

2. Why did we use it?

For the conception of our BacPROTACs, we needed a protein specific to OXA48. We decided to use a
nanobody but none existed in the previous publication. So we found one specific to CMY-2, an homologue
of OXA48.
In order to make that nanobody interact with our target protein, we needed to modify it. Indeed, a
nanobody is composed of three fixed parts and two variable ones that will make it specific to one protein.
So we decided to use a docking model (thanks to the suggestion of Riccardo Pellarin, who also helped us in
the use of this program), to modify and try to find good candidates.

3. How did we use the RF Diffusion Google Collab?

We didn’t use this algorithm at its fool capacity, and are not pretending to know all the tips and the ways to
use it. We will just show what we did to find our nanobody.

Warning: without a paid version, the algorithm can’t overpass a certain amount of time running nor a
certain amount of runs. If a lot of runs are needed, different email addresses can help. If the time limit is
exceeded, the user has to do all the steps again.

Previously to the use of this algorithm, we put the dimer Nanobody (CMY-2 specific)/Oxa48 protein in
Alphafold2, in order to have a scaffold structure predicted, and see how the two proteins could interact.

Step by step procedure :

1. First, the setup will install all the databases and commands needed for the algorithm to work, the
cell has to run before going to the next step.
2. The name of the run, the results will be download with that name at the end of the program
3. This line allow to choose which parts will be optimized:
A:B1-25/B32-51/B58-100/B116-129
● A → It means that the first monomer (chain of amino acids) won’t change
● : → Symbol of separation of the two monomers (/contigs)
● B1-25 → The amino acids from the position 1 to 25 (of the second monomer, B) are fixed
● /→ separation between two segments in the contig
● B32-51 → The amino acids from the position 1 to 25 are fixed. So amino acids from the 26
to the 31 included will be optimized.

A3-30/36/A33-68
● The algorithm will diffuse a loop of 36 a.a. between the two fixed parts
4. The amino acids chain is needed here. The use “:” permits to mark se separation between two
monomers.
5. Number of times the algorithm will repass the optimized backbone and re-optimized it to have the
best backbone possible. Set by default at 50.
6. How many first designs differents the algorithm will propose. By default to 1. We Chose 8 different.
Since we didn’t need any symmetry we didn’t use the second part and let it like that.
After the information is added, the cell needs to be runed.

2.
1. This cell will display the 3D structures of all the backbones it found.

2. This line allows you to choose the number of propositions for the sequences that ProteinMPNN will
find that fit in each backbone. (for example, we used 8 propositions for each backbone).
3. Number of times the algorithm will repass the optimized sequence and re-optimized it to have the
best sequence possible. Set by default at 1.
This cell will find the best sequences possible to fit into the backbone and test it with alphafold to see if it
folds accordingly to the backbone. It will show sequence coverage plots (the number of unique reads that
include a given nucleotide in the reconstructed sequence)

And the structures with surety fold prediction (pLDDT)

1. Will show the best result of the run

2. Will download the results on the user’s computer.

We did this program multiple times to have a lot of propositions at the end. And when we had all of them,
we needed to choose.
The program gives the results with a lot of scores that it calculated along the way. Here are the scores and
their meaning:

- MPNN(message passing neural network):

This score describes the protein-ligand interactions. It ranks a given set of chemical compounds
(ligands) with respect to its binding affinity to a particular target.

- pLDDT (predicted local distance difference test):

It goes from 0 to 100 and marks the per residue confidence score. It means the confidence of the
model prediction for each amino acid residue relative to the 𝛂 carbon atoms.
pLDDT > 90 signifies very high confidence, 70–90 good enough confidence, while <70 connotes low
model confidence

- PTM(Predicted Template Modeling Score):

It measures the structural conformity between two folded protein structures. It goes from 0 to 1,
with pTM < 0.2 that means that the residues are disordered proteins or with negligible correlation,
and pTM > 0.5 that means that conformity is strong enough to conclude.

- PAE (inter-chain predicted alignment error)

It ranked how well residues are arranged relative to the others in space. It means the intra/inter
domain distance between two residues relative to the true structure when aligned on the same
plane. For a confident prediction, the distance should stay between 0 and 35 Ångstroms.

- RMSD (root mean square deviation):

is the measure of the average distance between the atoms (usually the backbone atoms) of
superimposed proteins. It rates whether the alignment between the backbone initially predicted as
the best and the fold of the final nanobody is good or not. But the problem of this score is that
between really low (really good superposition) and really high, it’s difficult to conclude. A value
inferior to 2 Å is fairly good.

For ranking our nanobodies, we choose to only take sequences with a pLDDT > 0,89 and best RMSD.
Indeed, as all the other scores are really good, RFDiffusion gives a score to the nanobody based on the
RMSD (not all of them are lower than 2Å, so the lowest the better). The PAE and PTM and the MPNN
scores were good for all of them so not helping for any conclusion.

Paper+4 Dual Translated
No ratings yet
Paper+4 Dual Translated
14 pages
De Novo Design of Protein Structure and Function With RFdiffusion
No ratings yet
De Novo Design of Protein Structure and Function With RFdiffusion
38 pages
Denovo Design of Protein Structure
No ratings yet
Denovo Design of Protein Structure
44 pages
Deep Learning Protein Design with ProteinMPNN
No ratings yet
Deep Learning Protein Design with ProteinMPNN
18 pages
Efficient Generation of Protein Pockets With Pocketgen: Nature Machine Intelligence
No ratings yet
Efficient Generation of Protein Pockets With Pocketgen: Nature Machine Intelligence
15 pages
Alpha Fold
No ratings yet
Alpha Fold
16 pages
Robust Deep Learning-Based Protein Sequence Design Using ProteinMPNN
No ratings yet
Robust Deep Learning-Based Protein Sequence Design Using ProteinMPNN
15 pages
De Novo Protein Design Guide
No ratings yet
De Novo Protein Design Guide
6 pages
Protein Structure Prediction
No ratings yet
Protein Structure Prediction
23 pages
RFDiffustion
No ratings yet
RFDiffustion
54 pages
4.4 Binding Site and Protein Structure Prediction With AI
No ratings yet
4.4 Binding Site and Protein Structure Prediction With AI
29 pages
Generalized Biomolecular Modeling and Design With RoseTTAFold All-Atom
No ratings yet
Generalized Biomolecular Modeling and Design With RoseTTAFold All-Atom
11 pages
Science Adq1741
No ratings yet
Science Adq1741
7 pages
Homology Modeling in Bioinformatics Lab
No ratings yet
Homology Modeling in Bioinformatics Lab
12 pages
Protein Structure Prediction Methods
No ratings yet
Protein Structure Prediction Methods
23 pages
Highly Accurate Protein Structure Prediction With AlphaFold Nature
No ratings yet
Highly Accurate Protein Structure Prediction With AlphaFold Nature
1 page
Lab05 Manual
No ratings yet
Lab05 Manual
11 pages
Deep Learning for Protein Structure Prediction
No ratings yet
Deep Learning for Protein Structure Prediction
22 pages
Deep Learning-Driven Protein Structure Prediction and Design: Key Model Developments by Nobel Laureates and Multi-Domain Applications
No ratings yet
Deep Learning-Driven Protein Structure Prediction and Design: Key Model Developments by Nobel Laureates and Multi-Domain Applications
42 pages
ALPHAFOLD
No ratings yet
ALPHAFOLD
16 pages
Protein Structure and Sequence Generation With Equivariant Denoising Diffusion Probabilistic Models
No ratings yet
Protein Structure and Sequence Generation With Equivariant Denoising Diffusion Probabilistic Models
18 pages
AI Advances in Protein Folding Techniques
No ratings yet
AI Advances in Protein Folding Techniques
47 pages
Alpha Fold
No ratings yet
Alpha Fold
9 pages
Materi 1 - AI Driven Innovation
No ratings yet
Materi 1 - AI Driven Innovation
30 pages
Modelling of 3D Str. of Protein
No ratings yet
Modelling of 3D Str. of Protein
4 pages
Generation of 3D Structure of Protein
No ratings yet
Generation of 3D Structure of Protein
11 pages
Alpha Fold 2
No ratings yet
Alpha Fold 2
10 pages
AI Tool for Protein Structure Prediction
No ratings yet
AI Tool for Protein Structure Prediction
19 pages
Design of Diverse Asymmetric Pockets in de Novo Homo-Oligomeric Protein
No ratings yet
Design of Diverse Asymmetric Pockets in de Novo Homo-Oligomeric Protein
11 pages
Ijms 25 08426
No ratings yet
Ijms 25 08426
21 pages
Top-Down Design of Protein Architectures With Reinforcement Learning.
No ratings yet
Top-Down Design of Protein Architectures With Reinforcement Learning.
9 pages
Hydrophobic Residue Patterning in Proteins
No ratings yet
Hydrophobic Residue Patterning in Proteins
124 pages
Science Adf6591
No ratings yet
Science Adf6591
9 pages
Cyclic Peptide Structure Prediction and Design Using AlphaFold
No ratings yet
Cyclic Peptide Structure Prediction and Design Using AlphaFold
25 pages
Module 2 Overview: Spring Break
No ratings yet
Module 2 Overview: Spring Break
18 pages
Benchmarking Protein Structure Predictors To Assist Machine 1h3bc063
No ratings yet
Benchmarking Protein Structure Predictors To Assist Machine 1h3bc063
13 pages
Designing Cyclic Protein Oligomers Using AI
No ratings yet
Designing Cyclic Protein Oligomers Using AI
14 pages
Luca Angioloni: University of Florence Information Engineering Department
No ratings yet
Luca Angioloni: University of Florence Information Engineering Department
20 pages
Protein Structure Prediction Benchmarking
No ratings yet
Protein Structure Prediction Benchmarking
24 pages
Tun Yasu Vuna Kool 2021
No ratings yet
Tun Yasu Vuna Kool 2021
21 pages
s41586 021 03819 2 - Reference
No ratings yet
s41586 021 03819 2 - Reference
16 pages
Alpha Proteo
No ratings yet
Alpha Proteo
45 pages
AlphaFold - Laterst - s41586 021 03819 2
No ratings yet
AlphaFold - Laterst - s41586 021 03819 2
12 pages
DeepMind AlphaFold A Revolutionary Advance in Protein Structure Prediction
No ratings yet
DeepMind AlphaFold A Revolutionary Advance in Protein Structure Prediction
8 pages
AlphaFold: Breakthrough in Protein Prediction
No ratings yet
AlphaFold: Breakthrough in Protein Prediction
27 pages
One-Shot Design of Functional Protein Binders With Bindcraft
No ratings yet
One-Shot Design of Functional Protein Binders With Bindcraft
30 pages
Ebook Oligo Articles DR Baker 5994 8512en Agilent
No ratings yet
Ebook Oligo Articles DR Baker 5994 8512en Agilent
25 pages
ML Enzyme Design
No ratings yet
ML Enzyme Design
7 pages
Optimizing β-Glucanase-Xylanase Fusion Enzyme
No ratings yet
Optimizing β-Glucanase-Xylanase Fusion Enzyme
9 pages
Predicting Protein Flexibility With AlphaFold
No ratings yet
Predicting Protein Flexibility With AlphaFold
9 pages
Homology Modeling Techniques Explained
No ratings yet
Homology Modeling Techniques Explained
18 pages
Advances in Protein Structure Prediction
No ratings yet
Advances in Protein Structure Prediction
13 pages
Protein Structure Prediction Tools
No ratings yet
Protein Structure Prediction Tools
63 pages
Massivefold: Unveiling Alphafold'S Hidden Potential With Optimized and Parallelized Massive Sampling
No ratings yet
Massivefold: Unveiling Alphafold'S Hidden Potential With Optimized and Parallelized Massive Sampling
13 pages
AlphaFold 3: Biomolecular Interaction Prediction
No ratings yet
AlphaFold 3: Biomolecular Interaction Prediction
45 pages
Protein Structure Prediction With LLMs
No ratings yet
Protein Structure Prediction With LLMs
20 pages
Highly Accurate Protein Structure Prediction With Alphafold: Article
No ratings yet
Highly Accurate Protein Structure Prediction With Alphafold: Article
12 pages
CNC Milling Tutorial for HAAS Machines
0% (1)
CNC Milling Tutorial for HAAS Machines
20 pages
Biopharmaceutical Facility Design Services
No ratings yet
Biopharmaceutical Facility Design Services
2 pages
4 - IV Infusion
No ratings yet
4 - IV Infusion
42 pages
Grade 3 Weekly Lesson Plan: Week 6
No ratings yet
Grade 3 Weekly Lesson Plan: Week 6
2 pages
Circuit Simulation for Engineering Students
No ratings yet
Circuit Simulation for Engineering Students
6 pages
Polygon Area Formulas for Math Majors
No ratings yet
Polygon Area Formulas for Math Majors
9 pages
Andhra Pradesh
No ratings yet
Andhra Pradesh
5 pages
Statutory Requirements for Substation Energization
0% (1)
Statutory Requirements for Substation Energization
12 pages
FWD Data Analysis
No ratings yet
FWD Data Analysis
32 pages
Ford Mustang 99 Air Conditioning and Heating System
No ratings yet
Ford Mustang 99 Air Conditioning and Heating System
4 pages
VRV 2023
No ratings yet
VRV 2023
9 pages
State: Uttar Pradesh Agriculture Contingency Plan For District: Etawah
No ratings yet
State: Uttar Pradesh Agriculture Contingency Plan For District: Etawah
23 pages
Crystal Face Analysis Guide
No ratings yet
Crystal Face Analysis Guide
20 pages
MD3010ii Detector Manual Spanish PDF
No ratings yet
MD3010ii Detector Manual Spanish PDF
16 pages
Performance of Transmission Lines
No ratings yet
Performance of Transmission Lines
39 pages
Icom Ma 510 TR
No ratings yet
Icom Ma 510 TR
4 pages
Rotary Feeder Rotolok
No ratings yet
Rotary Feeder Rotolok
1 page
P2V Conversion Best Practices Guide
No ratings yet
P2V Conversion Best Practices Guide
49 pages
Buckling of Piles
No ratings yet
Buckling of Piles
22 pages
MH Set Sep 2015 - Iia
No ratings yet
MH Set Sep 2015 - Iia
16 pages
English Connectors: Contrast & Concession
No ratings yet
English Connectors: Contrast & Concession
12 pages
John Deere 1030
No ratings yet
John Deere 1030
3 pages
ZProject-Brush Tutorial for ZBrush3
No ratings yet
ZProject-Brush Tutorial for ZBrush3
25 pages
Capacitor
No ratings yet
Capacitor
8 pages
User Manual: Penloader Tool
No ratings yet
User Manual: Penloader Tool
104 pages
Applied Physics Paper MID 17042023 030109pm
No ratings yet
Applied Physics Paper MID 17042023 030109pm
2 pages
8, Transformers, 26307-20 Exam
No ratings yet
8, Transformers, 26307-20 Exam
4 pages
Revisiting Fred W. Riggs' Model in The Context of Prismatic' Societies Today
No ratings yet
Revisiting Fred W. Riggs' Model in The Context of Prismatic' Societies Today
9 pages
Paksitan Studies Section 1 Key Notes
100% (3)
Paksitan Studies Section 1 Key Notes
52 pages
Astrology Dasa Analysis Explained
No ratings yet
Astrology Dasa Analysis Explained
5 pages

RF Diffusion Manual

Uploaded by

RF Diffusion Manual

Uploaded by

MANUALS RF-

1. How does it work?

But, first, it is important to understand how protein design works.

At the end, it can provide a protein starting from pure noise.

Thanks to that, the amount of output it can give is endless.

Alphafold2 is an artificial intelligence algorithm developed to predict the three-dimensional structure of

2. Why did we use it?

3. How did we use the RF Diffusion Google Collab?

Step by step procedure :

And the structures with surety fold prediction (pLDDT)

1. Will show the best result of the run

- MPNN(message passing neural network):

- pLDDT (predicted local distance difference test):

- PTM(Predicted Template Modeling Score):

- PAE (inter-chain predicted alignment error)

- RMSD (root mean square deviation):

You might also like