You are on page 1of 24
Python for Biologists A collection of episodes with videos, codes, and exercises for learning the basies of the Python programming language through genomics examples. Read more Introduction to Python ] Welcome ® python print('Python for Biologists!') A ‘This sa customary frst program in any language asin- sleline code wo greet Python Despite ts simply, this J Episode provides essential information on the Unix op- crating system, writing ediing and executing « Python cae en een Video Source code Output Download (pythonSource/welcome.py) | Beginnings ® python organism = “Escherichia coli" Variables are the main building blocks of any computer code, Variable names in Python start with a leter and followed by combination of letters, digits, or under score. In this episode, we will meet some of the fun- damental Python variable types: numeric, string, list, and tuple eine Ue Nm Video Source code Output Download (pythonSource/beginnings.py) | Print ® python print('You have', human_genes, ‘genes') Providing informative and visually pleasing output isthe aspiration ofall good programmers In this episoxte, we explore some ofthe finer points of the print statement, the basic output facility of Python eine Ue Nm Video Source code Output Download (pythonSource/print.py) | Buggy ® python human genes = 20,000 Inevitable programming errors - bugs - come in two spe- cies: logical and syntactical. This episode is about strate- gies for eradicating syfpfactical bugs with a little help from the Python interpreter. eine Ue Nm Video Source code Output Download (pythonSourcevbuggy:py) | Introduction to Python - Exercises I Data Types | Numeric ® python firstProduct = (9.4*0.2321)*5.6 secondProduct = 9.4*(0.2321*5.6) [Numeric datatypes of Python include integers and floats. « In this episode we will introduce some of the basic oper- ations with these datatypes and face several inevitable shortcomings of computer arithmetic eine Ue Nm Video Source code Output. Download (pythonSource/numeric py) | Strings ® python protein = "GFP" String data type of Python holds a contiguous string of characters, enclosed in quotation marks. In this episode ‘we will illustrate some of the basic string operations on a partial DNA and protein sequences of GFP. eine Ue Nm Video Source code Output. Download (pythonSource/strings.py) ] List ® python stop_codons = ['TAA', ‘tAG'] Lists are used to store collection of similar is tries are defined by listing their contents in parentheses, separated by commas, The contents of a list are in- cdexed beginning with . In this episode, we will exe plore basic operations on list, "eaten Video Source code Output Download (pythonSoureellist py) | Tuple ® python Histidine = ('H', ‘CAT’, ‘CAC') Atuple is an immutable sequence data type of Python con- sisting of a number of items seperated by commas. In this episode, we will illustrate several tuple operations using basic amino acids. eine Ue Nm Video Source code Output Download (pythonSource/tuple py) | Dictionary ® python restriction_enzymes ={'EcoRI' :'GAATTC', ‘AluI' : ‘AGCT'} BLY) A cictionary sa type of variable akin toa ist. However, S324 instead of indexing values with integers, dictionaries use ‘unique name for each entry. In this episode, we will use restriction enzymes to explore some of the basic operations with dictionaries ¥ eine Ue Nm Video Source code Output. Download (pythonSouree/dictionary.py) | Data Types - Exercises II String Manipulations | Length ® python len(zika_DNA) ‘The Python Len () function takes a string as an argument and returns the number of characters in the string, In this episode, we will compute the number of nucleotides of a Zika DNA segment (the complete Zika genome has 10,794 bp). Video Source code Output Download (pythonSource/length.py) ] Concatenation ® python RNA_seq = RNA_seq + S_codon ‘The Python binary operator "+" concatenates two strings. In this episode, we will use this operator to concatenate several codons coding for GFP protein, eine Ue Nm Video Source code Output Download (pythonSource/concatenation.py) | Fina ® python chimp.find(codon, 27, 55) ‘The Python . £ind{( ) string method searches fora sub- string inside a string and returns the position ofthe first character of the substring insicle the string. In this epi- sode, we will use the . £ind() method to locate a codon in Chimp D-oop. eine Ue Nm Video Source code Output. Download (pythonSource/find py) I stice ® python human[-6:-3] ‘The Python sequence slicing facility [:] extracts a substring with a prescribed length and position froma string, In this episode, we will use slicing to extract coxions from Human D-loop. eine Ue Nm Video Source code Output. Download (pythonSource/stice py) ] Translate ® python complement =zika.translate(complements) = <9) the Python string methods str maketrans() and str-translate( ) are used to map one set of charac- ters to another. In this episode, using these methods, ‘we will compute the complementary strand of a Zika DNA segment, eine Ue Nm Video Source code Output Download (pythonSource/translate.py) | Reverse ® python reversed _zika_DNA = zika_DNA[ 1) putes the reverse of the string, In this episoxle, we will use this slicing to reverse a segment of the Zika virus $= ‘The Python string slicing operation string[ ::-1] com- genome. eine Ue Nm Video Source code Output Download (pythonSource/reverse py) | Replace ® python zika_DNA = zika_DNA.replace(' ', '') . ‘The Python string method str -replace(,) changes the f occurrences of a substring in a string to another sub- string. In this episode, we will use this method to remove spaces in a segment of Zika DNA in GenBank format. Video Source code Output. Download (pythonSource/replace.py) | String Manipulations - Exercises III Conditionals and Loops | IfElse ® python if (codon1 > ‘The if-else statement is the basic decision-making tool { for choosing between two alternatives. In this episode, ‘we will determine if the frst codon in a DNA segment i the start codon ATG oF not, and report the result. "ATG'): eine Ue Nm Video Source code Output Download (pythonSourcelif_else.py) | wet ® python "TAA') or "TAG') or elif ((codon1 (codon1 for choosing among multiple alternatives. In this epi- sode, we will determine if the last codon in a DNA seg ‘ment isthe start codon ATG, one of the stop codons (TAA, TAG, TGA), or neither of the above. i j ‘The if-e14¢ statement is the basic decision-making tool eine Ue Nm Video Source code Output Download (pythonSouree/if_elifpy) | white ® python while index < len(DNA_seq): ‘The whie loop of Python iterates a block of statements a6 | long.as an expression remains’ True. In this episode, we will use a while loop to develop an algorithm to com- pute the number of appearances of a nucleotide base (1) ina DNA string. eine Ue Nm Video Source code Output. Download (pythonSource/while py) ] For for second_base in bases: Ea ‘mar mel ‘mm tT ‘Tu ® python ‘The for loop of Python steps through a lst, performing, ne iteration for each value in the list. In this episode, using for loop twice, we will print out all codons starting with a thymine (I) eine Ue Nm Video Source code Output. Download (pythonSource/for.py) | Range range(0, number_of_years + 1, 1) ® python ‘The Python range) function can be used to generate a list of integers with a desired progression. In this epi- sode, we will use afor loop with a ange) generated ist to model the population size of Florida sandhill cranes in the Okefenokee swamp. eine Ue Nm Video Source code Output Download (pythonSource/range.py) | Count with Dictionary ® python codon_counter[codon] += 1 ‘The Python dictionary variable, with its associated opera- isthe tool of choice for tallying, In this episode, we use a dictionary variable to count the number of ‘codon that appears in a DNA segment. eine Ue Nm Video Source code Output Download (pythonSource/count_with_dictionary.py) | Conditionals and Loops - Exercises IV Utilities | Argyv ® python if len(sys.argv) == 1: print('Please provide an argument!') In Python, command-line arguments are provided after the name of the program. They are stored in alist called sys _argy. In this episode, we will demonstrate the use of arguments and the sys.argy list. eine Ue Nm Video Source code Output Download (pythonSource/argv.py) | Input ® python input( ‘Please type a NCBI number: ') ‘The Python input() function is used to receive input from the keyboard. In this episode, we will demonstrate how to capture user input from the keyboard and prepare it for use in a Python code. eine Ue Nm Video Source code Output Download (pythonSouree/input.py) | Try/except ® python except ValueError as ve: bd be handled with a try-except statement. In this episode, ‘we will learn how to handle several common excep- ( Errors detected during execution of code - exceptions can tions arising from erroneous user input eine Ue Nm Video Source code Output. Download (pythonSouree/try_except.py) ] Function ® python def DNAtoRNA(dna) : ‘A function isa reusable modular coxe fragment designed to perform a specific task. In this episode, we will writea function that takes a DNA segment as input and re- wf tums the corresponding RNA segment as output. eine Ue Nm Video Source code Output Download (pythonSource/fuction.py) | Utilities - Exercises V Files | Write to File ® python with open(filename, 'w') as f: Python can open a file with a designated file handle and a file name for the purpose of writing. Then, one can. EMM simply print to the opened file with awrite) func- tion or a print statement ditected through the file handle. In this episode, we will learn how to open and write toa file in FASTA format. eine Ue Nm Video Source code Output’ Download (pythonSource/write_to_file py) | Read from File ® python f£ = open(myfile, 'r') Python can open a file with a des file name for the purpose of reading, Then, one can simply read from the opened file, one line ata time. In this episode, we will earn how to opena file in @ Python cox and read the content ofthe file eine Ue Nm Video Source code Output Download (pythonSource/read_from_file-py) | Read FASTA File * not in line: ® python In this episode, we will open a file containing a DNA se- quence (BRAC2)in FASTA format, Then, we will read the file line-by-line, except the line starting with the > sign, we will strip newline characters, and we will concatenate all the lines into a single string, eine Ue Nm Source code Output Download (pythonSource/read_fasta_file py) | Files - Exercises VI Regular Expressions | Motif Search ® python motif = r'AATG.*?TAA' FSB Python's research) function finds the frst occurence SANTA of a motif (substring with a specific pattern) in a string, Vy, ‘i In this episode, we will use this function if a motif is, present in a DNA sequence, and report the matched Substring the start and end indices eine Ue Nm Video Source code Output Download (pythonSource/motif_search py) | Motif Search Groups ® python motif = r'(ATG(.*?)TAA)' In this episode we will explore the research ) func of Python further to extract information about the pare thesized (..) groupings in a moti. eine Ue Nm Video Source code Output. Download (pythonSource/motif_search_groups.py) | Motif Find AN ® python re.findall(motif, DNA_sequence) In this episode, we will use the re. £indal1 () function of, Python to locate all occurrences of a motif in a DNA se- quence. eine Ue Nm Video Source code Output. Download (pythonSource/motif_findall.py) | Motif Find Iter ® python re.finditer(motif, DNA_sequence) In this episode, we will use the re-finditer () function of Python to locate all occurrences of a motif and their starting and ending indices. eine Ue Nm Video Source code Output Download (pythonSource/motif_finditer.py) | Motif Search Interactive ® python if motif_found == True: Complex motifs can be specified with regular expressions. In this episode, we will develop an interactive Python program which will ask the user for a motif encoded in regular expression, search for the motif in a DNA se- quence, and report all its matches, eine Ue Nm Video Source code Output Download (pythonSource/motif_search_interactive.py) Python Tripos Regular expressions summary | Regular Expressions - Exercises VII | Online Emulator | References Copyright © 2020, Hiseyin Kogak (htp/imath miamy.edu/~hk), University of Miami and Basar Koc (atpsivwacbasarkoc.com), Stetson University

You might also like