Welcome to Scribd!

Genomic Analyses Using Radseq: 1. Raw Data Manipulation

Uploaded by

0% found this document useful (0 votes)

6 views7 pages

The document describes analyzing raw Illumina sequencing data from stickleback fish using R. It demonstrates uploading and inspecting the raw sequence data, performing pattern matching and subsetting to extract reads containing a specific sequence, cleaning reads by removing those containing N's, and trimming reads. It provides example code and outlines tasks for the student to practice working with ShortRead objects - including uploading data, subsetting based on barcodes and sequences, determining proportions of reads meeting criteria, and writing out filtered reads.

Original Description:

Original Title

RAD-R_ex1

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

6 views7 pages

Genomic Analyses Using Radseq: 1. Raw Data Manipulation

Uploaded by

Suany Quesada Calderon

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 7

Search inside document

Evolution and genomics, Cesky Krumlov

Daniel Berner
2. February 2016

Genomic analyses using RADseq:

1. Raw data manipulation
Demo
Upload and inspection of a raw Illumina sequence data set
(stickleback RAD sequences from the Misty system, Canada,
unpublished; 100k subset from a single SE100 Illumina lane)
library(ShortRead)
# d<-readFastq(dirPath='C:/Users/daniel/Documents/science/teaching/cesky
# krumlov 2016/course.materials/R.files',
# pattern='illumina.SE100.fastq', withIds=T)
d

## class: ShortReadQ
## length: 100000 reads; width: 100 cycles

# Call the read IDs of the ShortRead object

id(d)[1:4] # just the first four elements

## A BStringSet instance of length 4

## width seq
## [1] 53 BS-DSFCONTROL03:312:C3...1101:1232:2073 1:Y:0:
## [2] 53 BS-DSFCONTROL03:312:C3...1101:1213:2079 1:Y:0:
## [3] 53 BS-DSFCONTROL03:312:C3...1101:1185:2091 1:Y:0:
## [4] 53 BS-DSFCONTROL03:312:C3...1101:1102:2093 1:Y:0:
# Call the sequences
sread(d)[1:5]

## A DNAStringSet instance of length 5

## width seq
## [1] 100 CTGAATGCAGGTCATTTTGGTN...CGGACGNNNTCCCTGCATCCT
## [2] 100 TAGCATGCAGGAAGTCCGTTGN...CAACTCNNNAAAATTTGCCAA
## [3] 100 GCGCCTGCAGGGCTTTATCCAG...GCTGCTGNNCAGATGTCCTCC
## [4] 100 AGGTATATGCACAAAATGAGAT...CAATCTTNNCAAGCAACAGCA
## [5] 100 NCACGTGCACCAAAAAAAGAGT...NNNNNNNNNNNNNNNNNNNNN

# ... and their qualities

quality(d)[1:5]

## class: FastqQuality
## quality:
## A BStringSet instance of length 5
## width seq
## [1] 100 ;;<;@@2<@2222;@<<><;=#...#####################
## [2] 100 <<<??@<@=222@@@???@??#...#####################
## [3] 100 <<<@@@2<@22<@@@@?@?@@@...#####################
## [4] 100 <<<>?26@=))9>?@@@@@?<?...#####################
## [5] 100 #07>@222:))2=>@?@<;9>=...#####################
Pattern matching, counting

grep("TATATATATATATATATATA", sread(d))

## [1] 6053 15441 82461 84384

match <- grep("TATATATATATATATATATA", sread(d))

length(match)

## [1] 4

Subsetting of ShortRead object

d.match <- d[match]

sread(d.match)

## A DNAStringSet instance of length 4

## width seq
## [1] 100 TAGCATGCAGGGAGGCCTGTGT...TATTTTACACACAACGACAGA
## [2] 100 CTAGGTGCAGGTACAGTGATCG...TGCCTGCTCCCGACCGGCTTC
## [3] 100 CTGATGCAGGACAGGTCCTCCC...ATATATATATATATATATATC
## [4] 100 TAATGTGCAGGAGTCTGTAGTC...TATATATATATATATATATAT
Cleaning a ShortRead object

d.clean <- clean(d) # remove all reads with >= 1 'N'

sread(d.clean)[1]

## A DNAStringSet instance of length 1

## width seq
## [1] 100 CCATGTTGCAGGTGTGAAGGCT...GGGGACACGCCGGCCGTTTGC

Trimming a ShortRead object

d.trim <- narrow(d.match, start = 1, end = 10) # either end or width

quality(d.trim)

## class: FastqQuality
## quality:
## A BStringSet instance of length 4
## width seq
## [1] 10 ==>A<224?2
## [2] 10 BBCDF224A2
## [3] 10 @@@FFADDA2
## [4] 10 CCCFF222C2
Write a ShortRead object out as fastq file

# writeFastq(d.clean,
# file='C:/Users/daniel/Documents/science/teaching/cesky
# krumlov
# 2016/course.materials/R.files/my.clean.reads.fastq')
Tasks
I Upload the stickleback data set illumina.SE100.fastq
I Inspect the ID, sequence and quality of the reads 1000 to 1002
I Generate a new object X containing the data from the reads
10001-20000
I Determine the proportion of X’s reads containing one or more
’N’, and eliminate them from X
I What proportion of the filtered X is derived from the
individual with barcode (first five bases) ’CGATA’ ?
I Derive the object Y from X, including only these specific
CGATA-reads. Confirm that this worked by inspecting the
reads
I What proportion of Y’s reads contains the correct restriction
enzyme overhang ’TGCAGG’ at the correct position (i.e.,
following the barcode)? Copy these reads to object Z
I Clip the barcodes from Z, then write Z out as a fastq file

PEBC Reference Books
Document4 pages
PEBC Reference Books
Sylvia
No ratings yet
Oracle DBA Scripts One in All
Document143 pages
Oracle DBA Scripts One in All
Yasir Zahoor
92% (12)
Lab 9 Strings: MDJ11103 - Computer Programming Laboratory Module
Document6 pages
Lab 9 Strings: MDJ11103 - Computer Programming Laboratory Module
thivaagaar
No ratings yet
Clustering On Boston Dataset
Document3 pages
Clustering On Boston Dataset
anubhav582
No ratings yet
Excel Kod Sayfasi
Document1,617 pages
Excel Kod Sayfasi
FatihUyak
0% (1)
1.diagnosis Using ML
Document69 pages
1.diagnosis Using ML
Choral Wealth
No ratings yet
Computer Project
Document40 pages
Computer Project
Shivendra Ojha
No ratings yet
Mid s18
Document12 pages
Mid s18
RaJu SinGh
No ratings yet
Debarghya Das (Ba-1), 18021141033
Document12 pages
Debarghya Das (Ba-1), 18021141033
Rocking Heartbroker Deb
No ratings yet
Bs at HSW: Victor Omondi 7/27/2021 #Stem Majors Chisquare
Document12 pages
Bs at HSW: Victor Omondi 7/27/2021 #Stem Majors Chisquare
omondivkey
No ratings yet
Web Tech
Document36 pages
Web Tech
NARENDRA YADAV
No ratings yet
Lab 10 - 12
Document12 pages
Lab 10 - 12
KHALiFA Op
No ratings yet
DS Lab
Document31 pages
DS Lab
018 Neelima
No ratings yet
Handout 02
Document12 pages
Handout 02
maxi maaeez
No ratings yet
M. Balaji
Document53 pages
M. Balaji
CSK Marana waiter
No ratings yet
ML Acoerecordiot
Document18 pages
ML Acoerecordiot
Sanjeev Kumar
No ratings yet
Image Classifaction
Document17 pages
Image Classifaction
mk10oct
No ratings yet
S.J. Hariharan
Document53 pages
S.J. Hariharan
CSK Marana waiter
No ratings yet
Exp 7 PDF
Document4 pages
Exp 7 PDF
Ajay
No ratings yet
Multiple Regressor - Jupyter Notebook
Document78 pages
Multiple Regressor - Jupyter Notebook
Elyza
No ratings yet
Load Data The Easy Way
Document6 pages
Load Data The Easy Way
Sakshi
No ratings yet
Week2 DataWrangling DelimitedText PDF
Document5 pages
Week2 DataWrangling DelimitedText PDF
Dantè Chiapperini
No ratings yet
R Commands
Document18 pages
R Commands
Khizra Amir
No ratings yet
Conectar Excel A Access
Document14 pages
Conectar Excel A Access
chucheneicoo
No ratings yet
Computer Network - Lab Manuals
Document29 pages
Computer Network - Lab Manuals
rajneesh pachouri
No ratings yet
N.ragavan PDF
Document52 pages
N.ragavan PDF
CSK Marana waiter
No ratings yet
Big Data Analytics in Apache Spark
Document79 pages
Big Data Analytics in Apache Spark
ArXlan Xahir
No ratings yet
D. Perinban
Document53 pages
D. Perinban
CSK Marana waiter
No ratings yet
Panimalar Engineering College: Bangalore Trunk Road, Varadharajapuram, Nasarathpettai, Poonamallee, Chennai - 600 123
Document53 pages
Panimalar Engineering College: Bangalore Trunk Road, Varadharajapuram, Nasarathpettai, Poonamallee, Chennai - 600 123
CSK Marana waiter
No ratings yet
Logistic Regression Assignment
Document20 pages
Logistic Regression Assignment
Kiran Feroz
No ratings yet
Unnamed: 0 Sample Rock - Type Sio2 Tio2 Al2O3 Fe2O3 Mno Mgo Cao Na2O K2O P2O5 0 0 1 1 2 2 3 3 4 4
Document1 page
Unnamed: 0 Sample Rock - Type Sio2 Tio2 Al2O3 Fe2O3 Mno Mgo Cao Na2O K2O P2O5 0 0 1 1 2 2 3 3 4 4
Nishant Tripathi
No ratings yet
ISYE 6501 Georgia Tech hmwk3.1b
Document5 pages
ISYE 6501 Georgia Tech hmwk3.1b
aletia
No ratings yet
Practical Sample Paper
Document3 pages
Practical Sample Paper
apaar singh
No ratings yet
Computer Networks Lab
Document49 pages
Computer Networks Lab
Kumar Kumar
No ratings yet
02 BDAV Practicals 5-7
Document40 pages
02 BDAV Practicals 5-7
zubair
No ratings yet
Advanced TCL (OpenSees)
Document45 pages
Advanced TCL (OpenSees)
mgrubisic
No ratings yet
Mid s18 Sol
Document11 pages
Mid s18 Sol
RaJu SinGh
No ratings yet
R Programming Tutorial
Document66 pages
R Programming Tutorial
Kabira Kabira
No ratings yet
10-Visualization of Streaming Data and Class R Code-10!03!2023
Document19 pages
10-Visualization of Streaming Data and Class R Code-10!03!2023
G Krishna Vamsi
No ratings yet
Principal Component Analysis - Ipynb
Document27 pages
Principal Component Analysis - Ipynb
Daniel Williams
No ratings yet
Columbia Pre Borard - 1
Document10 pages
Columbia Pre Borard - 1
Abishek Jain
No ratings yet
Assignment-1 80501
Document6 pages
Assignment-1 80501
rishabh7arora
No ratings yet
INS Practical
Document44 pages
INS Practical
sahil
No ratings yet
Homework 3.1 PDF
Document3 pages
Homework 3.1 PDF
dimitris0maketas
No ratings yet
OOP LAB Manual
Document17 pages
OOP LAB Manual
ahmadirtaza10
No ratings yet
Week 10
Document47 pages
Week 10
Minh Tieu
No ratings yet
Ans Tut CH2
Document16 pages
Ans Tut CH2
Dhinesh Raman
No ratings yet
It - S All About Neighbors - Completed
Document14 pages
It - S All About Neighbors - Completed
Soulaima
No ratings yet
Point Query & Tiles Over Kerla: Manish Modani: Ts Timeslice Fts Forecasted Time Slice
Document8 pages
Point Query & Tiles Over Kerla: Manish Modani: Ts Timeslice Fts Forecasted Time Slice
user0x
No ratings yet
Grail
Document23 pages
Grail
Ashish Gupta
No ratings yet
Pyt Manual 1
Document85 pages
Pyt Manual 1
Hrithik Kumar
No ratings yet
NB 9
Document29 pages
NB 9
Alexandra Prokhorova
No ratings yet
Lab6 Results 8974917
Document4 pages
Lab6 Results 8974917
Revathy P
No ratings yet
DAA Lab Manual Simplified Version
Document31 pages
DAA Lab Manual Simplified Version
deepzd517
No ratings yet
DL Record Merged
Document113 pages
DL Record Merged
eeshikagandi
No ratings yet
Mecom1 Question Paper
Document12 pages
Mecom1 Question Paper
pajadhav
100% (1)
EM622 Data Analysis and Visualization Techniques For Decision-Making
Document47 pages
EM622 Data Analysis and Visualization Techniques For Decision-Making
Ridhi B
No ratings yet
Write A Program That Inputs Five Integer From The User and Store Them in An Array
Document2 pages
Write A Program That Inputs Five Integer From The User and Store Them in An Array
Mp Inayat Ullah
100% (2)
Computer Engineering Laboratory Solution Primer
From Everand
Computer Engineering Laboratory Solution Primer
Karan Bhandari
No ratings yet
Advanced Opensees Algorithms, Volume 1: Probability Analysis Of High Pier Cable-Stayed Bridge Under Multiple-Support Excitations, And Liquefaction
From Everand
Advanced Opensees Algorithms, Volume 1: Probability Analysis Of High Pier Cable-Stayed Bridge Under Multiple-Support Excitations, And Liquefaction
Mahdi Alborzi Verki
No ratings yet
Ian Talks JS A-Z: WebDevAtoZ, #1
From Everand
Ian Talks JS A-Z: WebDevAtoZ, #1
Ian Eress
No ratings yet
Introducing The "Step by Step 4ID Guide" For IPC Speakers
Document13 pages
Introducing The "Step by Step 4ID Guide" For IPC Speakers
Suany Quesada Calderon
No ratings yet
LEA: An R Package For Landscape and Ecological Association Studies
Document14 pages
LEA: An R Package For Landscape and Ecological Association Studies
Suany Quesada Calderon
No ratings yet
GOATOOLS: A Python Library For Gene Ontology Analyses
Document17 pages
GOATOOLS: A Python Library For Gene Ontology Analyses
Suany Quesada Calderon
No ratings yet
Scordato Et Al-2017-Molecular Ecology
Document16 pages
Scordato Et Al-2017-Molecular Ecology
Suany Quesada Calderon
No ratings yet
Finding The Genomic Basis of Local Adaptation: Pitfalls, Practical Solutions, and Future Directions
Document19 pages
Finding The Genomic Basis of Local Adaptation: Pitfalls, Practical Solutions, and Future Directions
Suany Quesada Calderon
No ratings yet
Focus On Your Science: Features
Document1 page
Focus On Your Science: Features
Suany Quesada Calderon
No ratings yet
V08 - Yvan Le Bras - Training RADSeq - 0 PDF
Document55 pages
V08 - Yvan Le Bras - Training RADSeq - 0 PDF
Suany Quesada Calderon
No ratings yet
Eva 12696 PDF
Document16 pages
Eva 12696 PDF
Suany Quesada Calderon
No ratings yet
T Z N - Et - Al 2018 Oikos
Document11 pages
T Z N - Et - Al 2018 Oikos
Suany Quesada Calderon
No ratings yet
Tutorial Genomics
Document51 pages
Tutorial Genomics
Suany Quesada Calderon
No ratings yet
A Tutorial: Genome - Based RNA - Seq Analysis Using The TUXEDO Package (Updated: 2014 - 10 - 21)
Document17 pages
A Tutorial: Genome - Based RNA - Seq Analysis Using The TUXEDO Package (Updated: 2014 - 10 - 21)
Suany Quesada Calderon
No ratings yet
Mytilus Eduli M. Trossulus
Document13 pages
Mytilus Eduli M. Trossulus
Suany Quesada Calderon
No ratings yet
IntroTutorial Dartr
Document67 pages
IntroTutorial Dartr
Suany Quesada Calderon
No ratings yet
Using IMa3 PDF
Document70 pages
Using IMa3 PDF
Suany Quesada Calderon
No ratings yet
Macse
Document5 pages
Macse
Suany Quesada Calderon
No ratings yet
Gene Ontology and Pathways: Ståle Nygård
Document38 pages
Gene Ontology and Pathways: Ståle Nygård
Suany Quesada Calderon
No ratings yet
Treml Et Al-2015-Diversity and Distributions
Document12 pages
Treml Et Al-2015-Diversity and Distributions
Suany Quesada Calderon
No ratings yet
396 2007 1 PB PDF
Document13 pages
396 2007 1 PB PDF
Suany Quesada Calderon
No ratings yet
Toro Et Al 2016. BPA and NP Removal From Municipal Wastewater by Tropical Horizontal Subsurface Constructed Wetlands-1
Document1 page
Toro Et Al 2016. BPA and NP Removal From Municipal Wastewater by Tropical Horizontal Subsurface Constructed Wetlands-1
Suany Quesada Calderon
No ratings yet
Ruiz Daniels Rose PDF
Document204 pages
Ruiz Daniels Rose PDF
Suany Quesada Calderon
No ratings yet
Intro Phylo Notes
Document36 pages
Intro Phylo Notes
Suany Quesada Calderon
No ratings yet
Introduction To Phylogeny: 36-149 The Tree of Life Christopher R. Genovese
Document20 pages
Introduction To Phylogeny: 36-149 The Tree of Life Christopher R. Genovese
Suany Quesada Calderon
No ratings yet
Coronavirus Vaccines and The Use of Aborted Fetal Cells - European Institute of Bioethics
Document5 pages
Coronavirus Vaccines and The Use of Aborted Fetal Cells - European Institute of Bioethics
Nikos Papados
100% (1)
Looking For A Workstudent Opportunity in Germany 1676723306
Document1 page
Looking For A Workstudent Opportunity in Germany 1676723306
pasifae.lynx8303
No ratings yet
Hepatitis Prevention
Document3 pages
Hepatitis Prevention
azisbustari
No ratings yet
Plant-Based Antidiabetic Nanoformulations: The Emerging Paradigm For E
Document44 pages
Plant-Based Antidiabetic Nanoformulations: The Emerging Paradigm For E
Alyna Alyna
No ratings yet
Bodel Freedmen in The Cena
Document279 pages
Bodel Freedmen in The Cena
Denis Keyer
No ratings yet
Module Exercise C
Document6 pages
Module Exercise C
Robert
No ratings yet
Chromosome Number Worksheet Meiosis Mitosis
Document3 pages
Chromosome Number Worksheet Meiosis Mitosis
Daisy Gamer
No ratings yet
T - 1437039951caipang 9
Document15 pages
T - 1437039951caipang 9
Doge Wowe
No ratings yet
Lombok Immunizationand Traveller231017-Dr - Agus Somia
Document63 pages
Lombok Immunizationand Traveller231017-Dr - Agus Somia
rinaldy IX9
No ratings yet
Sample Paper - 2008 Class - XII Subject - Biology: General Instructions
Document3 pages
Sample Paper - 2008 Class - XII Subject - Biology: General Instructions
Chandan Kumar Roy
No ratings yet
01 Pharmabest Presentacion 1
Document20 pages
01 Pharmabest Presentacion 1
Alex Rincon
No ratings yet
Transgenic Animals: Production and Application
Document15 pages
Transgenic Animals: Production and Application
All rounders study
No ratings yet
Bio-Sample Questions. Short Answer Type
Document3 pages
Bio-Sample Questions. Short Answer Type
ADITYA THAKUR
No ratings yet
Temperate Bacteriophage
Document26 pages
Temperate Bacteriophage
Izzah Mohd Khalil
No ratings yet
The History of Microbiology
Document3 pages
The History of Microbiology
Asfi Raian Ome
No ratings yet
Human ELISA Kit
Document2 pages
Human ELISA Kit
Edgar Mendoza García
No ratings yet
Life Technologies Gene Engineering
Document54 pages
Life Technologies Gene Engineering
Isaac Nicholas Notorio
No ratings yet
Basic Concepts in Genetics
Document28 pages
Basic Concepts in Genetics
Farooq Hussain
No ratings yet
Concentration Seed Company Ownership
Document2 pages
Concentration Seed Company Ownership
C'estMoi
No ratings yet
Teratogen: Student's Name Institution Course Title Instructor's Name Date
Document10 pages
Teratogen: Student's Name Institution Course Title Instructor's Name Date
Judith Chebet
No ratings yet
PHARMACOVIGILANCE
Document29 pages
PHARMACOVIGILANCE
heyyo gg
No ratings yet
Lab 8 - Transcription-Translation-ONLINE VERSION - 2021
Document11 pages
Lab 8 - Transcription-Translation-ONLINE VERSION - 2021
thesoccerprince.10
No ratings yet
Cloning Tech Guide PDF
Document40 pages
Cloning Tech Guide PDF
Luis Fernando Soto Ugaldi
No ratings yet
Unit 5 - Dna Replication
Document55 pages
Unit 5 - Dna Replication
api-262235970
100% (1)
Chapter 26 Phylogeny and The Tree of Life
Document41 pages
Chapter 26 Phylogeny and The Tree of Life
Sheerin Sulthana
100% (1)
Benjamin Watson - AQA - Biology - Inheritance, Variation and Evolution - KnowIT - GCSE
Document113 pages
Benjamin Watson - AQA - Biology - Inheritance, Variation and Evolution - KnowIT - GCSE
booboo
0% (1)
Restriction Enzymes
Document14 pages
Restriction Enzymes
Knowinter
0% (2)
L20 - DNA Repair & Recombination
Document41 pages
L20 - DNA Repair & Recombination
Leroy Cheng
No ratings yet
BCH361 - Course Outline
Document2 pages
BCH361 - Course Outline
Emilija Bjelajac
No ratings yet