Welcome to Scribd!

20 Project 1 Fun With DNA REGEX Lookaround

Uploaded by

0% found this document useful (0 votes)

5 views2 pages

This document discusses using regular expressions in Python to find opening reading frames (ORFs) in DNA sequences. It explains that an ORF consists of a start codon followed by codons and ending with a stop codon. The example DNA sequence contains two overlapping ORFs that are found using different regex patterns - one that consumes the first ORF and another that uses a positive lookahead to find both ORFs without consuming them. The project adapts examples from Rex Dwyer's ipython notebook to demonstrate finding ORFs in DNA sequences with regex.

Original Description:

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

5 views2 pages

20 Project 1 Fun With DNA REGEX Lookaround

Uploaded by

ArvindSharma

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 2

Search inside document

Project 1: Fun with DNA (REGEX Lookaround)!

In this project we nd Opening reading frame or ORF from DNA sequences with the help of Python regex.

DNA is a sequence of bases, A , C , G , or T . They are translated into proteins 3-

bases where each sequence is called a codon. There is a special start codon
ATG , and three stop codons, TGA , TAG , and TAA . Example:

cgcgcATGcATGcgTGAcTAAcgTAGcgcgcgcgc

An opening reading frame or ORF consists of a start codon, followed by some

more codons, and ending with a stop codon. The above example has
overlapping ORFs.

ATGcATGcgTGA and

ATGcgTGAcTAA .

The following pattern only finds the first ORF ( atgcatgcgtga' ). Since it
consumes the first ORF, it also consumes the beginning of the second ORF.

from re import *

dna = 'cgcgcATGcATGcgTGAcTAAcgTAGcgcgcgcgc'
dna = dna.lower()
orfpat = r'(?x) ( atg (?: (?!tga|tag|taa) ... )* (?:tga|tag|taa) )'
print findall(orfpat,dna)

We want to find an ORF without consuming it, we can use a positive

lookahead assertion ( (?= ( atg ). We put the whole ORF pattern inside the
lookahead and find the two atgcatgcgtga and atgcgtgactaa .

from re import *

dna = 'cgcgcATGcATGcgTGAcTAAcgTAGcgcgcgcgc'
dna = dna.lower()
orfpat = r'(?x) (?= ( atg (?: (?!tga|tag|taa) ... )* (?:tga|tag|taa) ))'
s = findall(orfpat,dna)

if s:
print ', '.join(s)

This project adopts and simplifies the Splitsvile examples (DNA) from Rex
Dwyer’s ipython notebook.

Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
Rating: 1 out of 5 stars
1/5 (1)
ATPG Methodology Flow
Document37 pages
ATPG Methodology Flow
Adhi Suruli
No ratings yet
ATPG Methodology Flow
Document37 pages
ATPG Methodology Flow
aanbalan
No ratings yet
Go1 2
Document559 pages
Go1 2
Daniel Osorio
No ratings yet
Coding Interview
Document109 pages
Coding Interview
Ashok Obuli
100% (4)
Assignnment 04 Sol Ution
Document5 pages
Assignnment 04 Sol Ution
earn money
No ratings yet
1 Assignemnt 1: 1.1 Name:Rohith ND 1.2 Roll No:20054
Document13 pages
1 Assignemnt 1: 1.1 Name:Rohith ND 1.2 Roll No:20054
Rohith ND
No ratings yet
Paparala Port Interface
Document11 pages
Paparala Port Interface
Radheshyam Rai
No ratings yet
Section A Finding Public Biological Databases
Document4 pages
Section A Finding Public Biological Databases
Misty Brown
No ratings yet
12 Pubkey DH Annotated
Document42 pages
12 Pubkey DH Annotated
-Ashrant Shinh-
No ratings yet
Sequence File Formats
Document22 pages
Sequence File Formats
Pragya Mukherjee
No ratings yet
Illumina Adapter and Primer Sequences
Document8 pages
Illumina Adapter and Primer Sequences
speshec
No ratings yet
Internal Sorting Array
Document15 pages
Internal Sorting Array
suryaprabha
No ratings yet
Java QnA
Document240 pages
Java QnA
Gustavo Marquez
No ratings yet
Assignment 1
Document5 pages
Assignment 1
mk5514075
No ratings yet
Lecture 4
Document20 pages
Lecture 4
ashodhiya14
No ratings yet
FIRST Set in Syntax Analysis: Lecture-05
Document14 pages
FIRST Set in Syntax Analysis: Lecture-05
abu syed
No ratings yet
PHP LNG Exh
Document5 pages
PHP LNG Exh
joedeveloper
No ratings yet
Aman Nye
Document13 pages
Aman Nye
Solah Alaam
No ratings yet
Matlab Python Cheatsheet Formulae PDF
Document17 pages
Matlab Python Cheatsheet Formulae PDF
Chandra Prakash Khatri
100% (1)
1 Complexity and Recursion
Document6 pages
1 Complexity and Recursion
Shuvadeep Das
No ratings yet
Opcode32 Preview
Document4 pages
Opcode32 Preview
Stewart Henderson
No ratings yet
Intro To NGS - Torsten Seemann - PeterMac - 27 Jul 2012
Document51 pages
Intro To NGS - Torsten Seemann - PeterMac - 27 Jul 2012
Gusti Deky Genetika Science
No ratings yet
Basa RNA
Document4 pages
Basa RNA
Ich Ram
No ratings yet
Two Pass Assembler Code
Document4 pages
Two Pass Assembler Code
Ashish Katlam
No ratings yet
03-134181-026 Daa PDF
Document4 pages
03-134181-026 Daa PDF
Hamza Phanoti
100% (1)
Software Implementation and Testing of GARCH Models
Document33 pages
Software Implementation and Testing of GARCH Models
meetmauro
No ratings yet
L11 Biol 261 Ftranslation 2014
Document44 pages
L11 Biol 261 Ftranslation 2014
jdazuelos
No ratings yet
Ericsson Command Handling-1
Document6 pages
Ericsson Command Handling-1
mamshadmm
No ratings yet
Efficient Parsing Using Recursive Transition Networks With Output
Document5 pages
Efficient Parsing Using Recursive Transition Networks With Output
jsastrem
No ratings yet
Assignment 3
Document8 pages
Assignment 3
Samson Fung
No ratings yet
AmberTools13 Cpptraj Ptraj
Document91 pages
AmberTools13 Cpptraj Ptraj
mayankgia
No ratings yet
Module 1 AAD
Document9 pages
Module 1 AAD
cahebev
No ratings yet
Iptables - A INPUT - P TCP - Tcp-Flags SYN, RST, ACK, FIN SYN - J ACCEPT # Äquivalent Zu - Syn
Document2 pages
Iptables - A INPUT - P TCP - Tcp-Flags SYN, RST, ACK, FIN SYN - J ACCEPT # Äquivalent Zu - Syn
list_course
No ratings yet
Lecture 1
Document24 pages
Lecture 1
anantdrive2103
No ratings yet
The Hobby Package: Andrew Stacey 1.7 From 2016-02-19
Document23 pages
The Hobby Package: Andrew Stacey 1.7 From 2016-02-19
Gyulai Béla
No ratings yet
Lapacke
Document10 pages
Lapacke
Mario Zamora
No ratings yet
Kaks - Calculator Toolbox 2.0: User'S Manual
Document19 pages
Kaks - Calculator Toolbox 2.0: User'S Manual
蔡森
No ratings yet
Axiom Systems For Classical First-Order Logic
Document12 pages
Axiom Systems For Classical First-Order Logic
omertave
No ratings yet
C Odigo:: #Include #Define
Document2 pages
C Odigo:: #Include #Define
Karen
No ratings yet
KulBiotePert 3rd 2021
Document63 pages
KulBiotePert 3rd 2021
wicaksonobimaw
No ratings yet
Name Synopsis: Libgraph Abstract Graph Library
Document4 pages
Name Synopsis: Libgraph Abstract Graph Library
Carlos Dorado
No ratings yet
ORF Finder Exercise-2
Document2 pages
ORF Finder Exercise-2
Khai Cheng Kiew
No ratings yet
Data Structures and Algorithms (DSA) - Cropped
Document380 pages
Data Structures and Algorithms (DSA) - Cropped
Swayam Gosavi
No ratings yet
8GOM
Document6 pages
8GOM
aakif inayat
No ratings yet
MIT6 087IAP10 Assn03
Document7 pages
MIT6 087IAP10 Assn03
JHON LIZARAZO
No ratings yet
EEG Presentation
Document35 pages
EEG Presentation
Leena Gmal
No ratings yet
#Include #Include #Include Void
Document3 pages
#Include #Include #Include Void
Tanya Sunish
No ratings yet
Text2pcap Man Instruct
Document3 pages
Text2pcap Man Instruct
Bharanitharan Sundaram
No ratings yet
S14bac cl05
Document3 pages
S14bac cl05
Khhg Agdds
No ratings yet
Code
Document37 pages
Code
Sunil Singireddy
No ratings yet
Name: Shubhangi Mapare Roll No: 17112 Div: TE A Subject: SPOS Lab
Document7 pages
Name: Shubhangi Mapare Roll No: 17112 Div: TE A Subject: SPOS Lab
Balbhim Mapare
No ratings yet
DAA Module-1
Document21 pages
DAA Module-1
Krittika Hegde
No ratings yet
MCAREPORT
Document12 pages
MCAREPORT
pragyan jyoti
No ratings yet
BANING Activity No. 5.2 DNA Base Pairing
Document3 pages
BANING Activity No. 5.2 DNA Base Pairing
Fatima Nur Faiza Handa
No ratings yet
Oliver Hampton: X Ith Co-Authors Phuc Nguyen and Alessandro Dal Palú
Document51 pages
Oliver Hampton: X Ith Co-Authors Phuc Nguyen and Alessandro Dal Palú
Mudit Misra
No ratings yet
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet
Combinatorial Algorithms: For Computers and Calculators
From Everand
Combinatorial Algorithms: For Computers and Calculators
Albert Nijenhuis
Rating: 4 out of 5 stars
4/5 (2)
IP Routing Protocols All-in-one: OSPF EIGRP IS-IS BGP Hands-on Labs
From Everand
IP Routing Protocols All-in-one: OSPF EIGRP IS-IS BGP Hands-on Labs
Redouane MEDDANE
No ratings yet
String Algorithms in C: Efficient Text Representation and Search
From Everand
String Algorithms in C: Efficient Text Representation and Search
Thomas Mailund
No ratings yet
9 Python Regex Group Functions
Document2 pages
9 Python Regex Group Functions
ArvindSharma
No ratings yet
16 Basics
Document2 pages
16 Basics
ArvindSharma
No ratings yet
18 Python Lookbehind
Document2 pages
18 Python Lookbehind
ArvindSharma
No ratings yet
8 Python Regex Match Vs Search Functions
Document2 pages
8 Python Regex Match Vs Search Functions
ArvindSharma
No ratings yet
7 Python Regex Search Function Coding Exercise
Document2 pages
7 Python Regex Search Function Coding Exercise
ArvindSharma
No ratings yet
3 Quiz 1 REGEX Patterns
Document4 pages
3 Quiz 1 REGEX Patterns
ArvindSharma
No ratings yet
4 Python Regex Match Function
Document4 pages
4 Python Regex Match Function
ArvindSharma
No ratings yet