Package ‘Rsamtools’

February 25, 2017
Type Package
Title Binary alignment (BAM), FASTA, variant call (BCF), and tabix
file import
Version 1.27.12
Author Martin Morgan, Herv\{}'e Pag\{}`es, Valerie Obenchain, Nathaniel Hayden
Maintainer Bioconductor Package Maintainer
<maintainer@bioconductor.org>
Description This package provides an interface to the 'samtools',
'bcftools', and 'tabix' utilities (see 'LICENCE') for
manipulating SAM (Sequence Alignment / Map), FASTA, binary
variant call (BCF) and compressed indexed tab-delimited
(tabix) files.

URL http://bioconductor.org/packages/release/bioc/html/Rsamtools.html
License Artistic-2.0 | file LICENSE
LazyLoad yes
Depends methods, GenomeInfoDb (>= 1.1.3), GenomicRanges (>= 1.21.6),
Biostrings (>= 2.37.1)
Imports utils, BiocGenerics (>= 0.1.3), S4Vectors (>= 0.13.8), IRanges
(>= 2.3.7), XVector (>= 0.15.1), zlibbioc, bitops, BiocParallel
Suggests GenomicAlignments, ShortRead (>= 1.19.10), GenomicFeatures,
TxDb.Dmelanogaster.UCSC.dm3.ensGene, KEGG.db,
TxDb.Hsapiens.UCSC.hg18.knownGene, RNAseqData.HNRNPC.bam.chr14,
BSgenome.Hsapiens.UCSC.hg19, pasillaBamSubset, RUnit, BiocStyle
LinkingTo S4Vectors, IRanges, XVector, Biostrings
biocViews DataImport, Sequencing, Coverage, Alignment, QualityControl
Video https://www.youtube.com/watch?v=Rfon-DQYbWA&list=UUqaMSQd_h-2EDGsU6WDiX0Q
NeedsCompilation yes

R topics documented:
Rsamtools-package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
applyPileups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
ApplyPileupsParam . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
BamFile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1

2 Rsamtools-package

BamInput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
BamViews . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
BcfFile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
BcfInput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
deprecated . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
FaFile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
FaInput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
headerTabix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
indexTabix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
pileup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
PileupFiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
quickBamFlagSummary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
readPileup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
RsamtoolsFile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
RsamtoolsFileList . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
ScanBamParam . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
ScanBcfParam-class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
seqnamesTabix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
TabixFile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
TabixInput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
testPairedEndBam . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

Index 60

Rsamtools-package ’samtools’ aligned sequence utilities interface

Description
This package provides facilities for parsing samtools BAM (binary) files representing aligned se-
quences.

Details
See packageDescription('Rsamtools') for package details. A useful starting point is the scanBam
manual page.

Note
This package documents the following classes for purely internal reasons, see help pages in other
packages: bzfile, fifo, gzfile, pipe, unz, url.

Author(s)
Author: Martin Morgan
Maintainer: Biocore Team c/o BioC user list <bioconductor@stat.math.ethz.ch>

References
The current source code for samtools and bcftools is from https://github.com/samtools/samtools.
Additional material is at http://samtools.sourceforge.net/.

applyPileups 3

Examples
packageDescription('Rsamtools')

applyPileups Apply a user-provided function to calculate pile-up statistics across
multiple BAM files.

Description
applyPileups scans one or more BAM files, returning position-specific sequence and quality sum-
maries.

Usage
applyPileups(files, FUN, ..., param)

Arguments
files A PileupFiles instances.
FUN A function of 1 argument, x, to be evaluated for each yield (see yieldSize,
yieldBy, yieldAll). The argument x is a list, with elements describing the
current pile-up. The elements of the list are determined by the argument what,
and include:
seqnames: (Always returned) A named integer() representing the seqnames
corresponding to each position reported in the pile-up. This is a run-length
encoding, where the names of the elements represent the seqnames, and the
values the number of successive positions corresponding to that seqname.
pos: Always returned) A integer() representing the genomic coordinate of
each pile-up position.
seq: An array of dimensions nucleotide x file x position.
The ‘nucleotide’ dimension is length 5, corresponding to ‘A’, ‘C’, ‘G’, ‘T’,
and ‘N’ respectively.
Entries in the array represent the number of times the nucleotide occurred
in reads in the file overlapping the position.
qual: Like seq, but summarizing quality; the first dimension is the Phred-
encoded quality score, ranging from ‘!’ (0) to ‘~’ (93).
... Additional arguments, passed to methods.
param An instance of the object returned by ApplyPileupsParam.

Details
Regardless of param values, the algorithm follows samtools by excluding reads flagged as un-
mapped, secondary, duplicate, or failing quality control.

Value
applyPileups returns a list equal in length to the number of times FUN has been called, with each
element containing the result of FUN.
ApplyPileupsParam returns an object describing the parameters.

-colSums(p * log(p)) ifelse(cvg == 4L.system.applyPileups(fls. h) }) list(seqnames=x[["seqnames"]]. function(y) { y <. calcInfo) identical(res. IRanges(c(1000.sourceforge. Examples fl <. param=param) res1 <. "seqnames") . "seq2"). package="Rsamtools". "[[".net/ See Also ApplyPileupsParam. fl)) calcInfo <- function(x) { ## information at each pile-up position info <.colSums(y) p <.y + 1L # continuity cvg <.PileupFiles(c(fl. NA. "T"). param=param) str(res) head(res[[1]][["pos"]]) # positions matching param head(res[[1]][["info"]]) # inforamtion in each file ## 'param' as part of 'files' fls1 <.PileupFiles(c(fl.ApplyPileupsParam(which=which.applyPileups(fls1.GRanges(c("seq1". info=info) } which <. pos=x[["pos"]].4 applyPileups Author(s) Martin Morgan References http://samtools. 2.drop=FALSE] y <. "C".y / cvg[col(y)] h <. 2000)) param <. yieldBy="position". what="seq") res <.apply(x[["seq"]]. across ranges param <. calcInfo. what="seq") res <.y[c("A".file("extdata".ApplyPileupsParam(which=which.bam".applyPileups(fls. res1) ## yield by position. fl). param=param) sapply(res. calcInfo. "ex1. yieldSize=500L.. "G". mustWork=TRUE) fls <. 1000).

which = GRanges().ApplyPileupsParam 5 ApplyPileupsParam Parameters for creating pileups from BAM files Description Use ApplyPileupsParam() to create a parameter object influencing what fields and which records are used to calculate pile-ups. maxDepth = 250L.value plpWhich(object) plpWhich(object) <. what = c("seq". minDepth = 0L. restricting various aspects of reads to be included or excluded. . Usage # Constructor ApplyPileupsParam(flag = scanBamFlag(). minMapQuality = 0L.value plpMinBaseQuality(object) plpMinBaseQuality(object) <. yieldSize = 1L. and to influence the values returned.value ## S4 method for signature 'ApplyPileupsParam' show(object) Arguments flag An instance of the object returned by scanBamFlag. "position").value plpWhat(object) plpWhat(object) <.value plpMinMapQuality(object) plpMinMapQuality(object) <. yieldAll = FALSE.value plpYieldSize(object) plpYieldSize(object) <.value plpMinDepth(object) plpMinDepth(object) <. yieldBy = c("range". "qual")) # Accessors plpFlag(object) plpFlag(object) <.value plpYieldBy(object) plpYieldBy(object) <.value plpMaxDepth(object) plpMaxDepth(object) <.value plpYieldAll(object) plpYieldAll(object) <. minBaseQuality The minimum read base quality below which the base is ignored when summa- rizing pileup information. minBaseQuality = 13L.

yieldSize The number of records to include in each call to FUN. By position means that FUN is invoked whenever pile-ups have been accumulated for yieldSize positions. Constructor: ApplyPileupsParam: Returns a ApplyPileupsParam object. minDepth The minimum depth of the pile-up below which the position is ignored. Slots Slot interpretation is as described in the ‘Arguments’ section. . or just those passing the fil- tering criteria of flag. yieldBy An character(1). minMapQuality An integer(1). yieldBy How records are to be counted. When yieldAll=TRUE. Functions and methods See ’Usage’ for details on invocation. what A character() instance indicating what values are to be returned. yieldAll Whether to report all positions (yieldAll=TRUE). flag Object of class integer encoding flags to be kept when they have their ’0’ (keep0) or ’1’ (keep1) bit set. minDepth An integer(1).6 ApplyPileupsParam minMapQuality The minimum mapping quality below which the entire read is ignored. value An instance to be assigned to the corresponding slot of the ApplyPileupsParam instance. object An instace of class ApplyPileupsParam. which A GRanges or RangesList instance. this can be used to limit memory consumption. value is coerced to the type of the corresponding slot. Accessors: get or set corresponding slot values. maxDepth The maximum depth of reads considered at any position. minBaseQuality. regardless of ranges in which. for setters. Objects from the Class Objects are created by calls of the form ApplyPileupsParam(). positions not passing filter criteria have ‘0’ entries in seq or qual. yieldSize An integer(1). etc. yieldAll A logical(1). By range (in which case yieldSize must equal 1) means that FUN is invoked once for each range in which. which A GRanges or RangesList instance restricting pileup calculations to the corre- sponding genomic locations. maxDepth An integer(1). One or more of c("seq". what A character(). "qual"). minBaseQuality An integer(1).

plpWhich<. plpYieldSize<.Returns or sets an integer(1) vector of yield size. plpFlag<.Returns or sets an character(1) vector determining how pileups will be returned. plpYieldAll. plpMinBaseQuality<.Returns or sets an integer(1) vector of miminum base qualities. plpYieldAll<.Returns or sets the named integer vector of flags. plpYieldBy.BamFile 7 plpFlag.Returns or sets an integer(1) vector of miminum map qualities. plpMinDepth. or just those satisfying pileup positions. avoiding costly index re-loading. plpMinMapQuality<. plpWhat. plpWhich. plpMinBaseQuality. plpYieldBy<. plpMaxDepth. BamFileList() provides a convenient way of managing a list of BamFile instances. plpMaxDepth<.Returns or sets an logical(1) vector indicating whether all positions. Examples example(applyPileups) BamFile Maintain and use BAM files Description Use BamFile() to create a reference to a BAM file (and optionally its index). Methods: show Compactly display the object.Returns or sets an integer(1) vector of the maximum depth to which pileups are calculated. plpYieldSize. .Returns or sets the object influencing which locations pileups are calcu- lated over. see scanBamFlag. Author(s) Martin Morgan See Also applyPileups. The reference remains open across calls to methods. plpMinMapQuality.Returns or sets an integer(1) vector of miminum pileup depth. plpMinDepth<. plpWhat<. are to be returned.Returns or sets the character vector describing what summaries are re- turned by pileup.

8 BamFile

Usage

## Constructors

BamFile(file, index=file, ..., yieldSize=NA_integer_, obeyQname=FALSE,
asMates=FALSE, qnamePrefixEnd=NA, qnameSuffixStart=NA)
BamFileList(..., yieldSize=NA_integer_, obeyQname=FALSE, asMates=FALSE,
qnamePrefixEnd=NA, qnameSuffixStart=NA)

## Opening / closing

## S3 method for class 'BamFile'
open(con, ...)
## S3 method for class 'BamFile'
close(con, ...)

## accessors; also path(), index(), yieldSize()

## S4 method for signature 'BamFile'
isOpen(con, rw="")
## S4 method for signature 'BamFile'
isIncomplete(con)
## S4 method for signature 'BamFile'
obeyQname(object, ...)
obeyQname(object, ...) <- value
## S4 method for signature 'BamFile'
asMates(object, ...)
asMates(object, ...) <- value
## S4 method for signature 'BamFile'
qnamePrefixEnd(object, ...)
qnamePrefixEnd(object, ...) <- value
## S4 method for signature 'BamFile'
qnameSuffixStart(object, ...)
qnameSuffixStart(object, ...) <- value

## actions

## S4 method for signature 'BamFile'
scanBamHeader(files, ..., what=c("targets", "text"))
## S4 method for signature 'BamFile'
seqinfo(x)
## S4 method for signature 'BamFileList'
seqinfo(x)
## S4 method for signature 'BamFile'
filterBam(file, destination, index=file, ...,
filter=FilterRules(), indexDestination=TRUE,
param=ScanBamParam(what=scanBamWhat()))
## S4 method for signature 'BamFile'
indexBam(files, ...)
## S4 method for signature 'BamFile'
sortBam(file, destination, ..., byQname=FALSE, maxMemory=512)
## S4 method for signature 'BamFileList'

BamFile 9

mergeBam(files, destination, ...)

## reading

## S4 method for signature 'BamFile'
scanBam(file, index=file, ..., param=ScanBamParam(what=scanBamWhat()))

## counting

## S4 method for signature 'BamFile'
idxstatsBam(file, index=file, ...)
## S4 method for signature 'BamFile'
countBam(file, index=file, ..., param=ScanBamParam())
## S4 method for signature 'BamFileList'
countBam(file, index=file, ..., param=ScanBamParam())
## S4 method for signature 'BamFile'
quickBamFlagSummary(file, ..., param=ScanBamParam(), main.groups.only=FALSE)

Arguments
... Additional arguments.
For BamFileList, this can either be a single character vector of paths to BAM
files, or several instances of BamFile objects. When a character vector of paths,
a second named argument ‘index’ can be a character() vector of length equal
to the first argument specifying the paths to the index files, or character() to
indicate that no index file is available. See BamFile.
con An instance of BamFile.
x, object, file, files
A character vector of BAM file paths (for BamFile) or a BamFile instance (for
other methods).
index character(1); the BAM index file path (for BamFile); ignored for all other meth-
ods on this page.
yieldSize Number of records to yield each time the file is read from with scanBam. See
‘Fields’ section for details.
asMates Logical indicating if records should be paired as mates. See ‘Fields’ section for
details.
qnamePrefixEnd Single character (or NA) marking the end of the qname prefix. When specified,
all characters prior to and including the qnamePrefixEnd are removed from
the qname. If the prefix is not found in the qname the qname is not trimmed.
Currently only implemented for mate-pairing (i.e., when asMates=TRUE in a
BamFile.
qnameSuffixStart
Single character (or NA) marking the start of the qname suffix. When specified,
all characters following and including the qnameSuffixStart are removed from
the qname. If the suffix is not found in the qname the qname is not trimmmed.
Currently only implemented for mate-pairing (i.e., when asMates=TRUE in a
BamFile.
obeyQname Logical indicating if the BAM file is sorted by qname. In Bioconductor > 2.12
paired-end files do not need to be sorted by qname. Instead use asMates=TRUE
for reading paired-end data. See ‘Fields’ section for details.
value Logical value for setting asMates and obeyQname in a BamFile instance.

10 BamFile

what For scanBamHeader, a character vector specifying that either or both of c("targets", "text")
are to be extracted from the header; see scanBam for additional detail.
filter A FilterRules instance. Functions in the FilterRules instance should expect
a single DataFrame argument representing all information specified by param.
Each function must return a logical vector, usually of length equal to the num-
ber of rows of the DataFrame. Return values are used to include (when TRUE)
corresponding records in the filtered BAM file.
destination character(1) file path to write filtered reads to.
indexDestination
logical(1) indicating whether the destination file should also be indexed.
byQname, maxMemory
See sortBam.
param An optional ScanBamParam instance to further influence scanning, counting, or
filtering.
rw Mode of file; ignored.
main.groups.only
See quickBamFlagSummary.

Objects from the Class
Objects are created by calls of the form BamFile().

Fields
The BamFile class inherits fields from the RsamtoolsFile class and has fields:

yieldSize: Number of records to yield each time the file is read from using scanBam or, when
length(bamWhich()) != 0, a threshold which yields records in complete ranges
whose sum first exceeds yieldSize. Setting yieldSize on a BamFileList does not alter
existing yield sizes set on the individual BamFile instances.
asMates: A logical indicating if the records should be returned as mated pairs. When TRUE
scanBam attempts to mate (pair) the records and returns two additional fields groupid and
mate_status. groupid is an integer vector of unique group ids; mate_status is a factor with
level mated for records successfully paired by the algorithm, ambiguous for records that are
possibly mates but cannot be assigned unambiguously, or unmated for reads that did not have
valid mates.
Mate criteria:
• Bit 0x40 and 0x80: Segments are a pair of first/last OR neither segment is marked first/last
• Bit 0x100: Both segments are secondary OR both not secondary
• Bit 0x10 and 0x20: Segments are on opposite strands
• mpos match: segment1 mpos matches segment2 pos AND segment2 mpos matches seg-
ment1 pos
• tid match
Flags, tags and ranges may be specified in the ScanBamParam for fine tuning of results.
obeyQname: A logical(0) indicating if the file was sorted by qname. In Bioconductor > 2.12
paired-end files do not need to be sorted by qname. Instead set asMates=TRUE in the BamFile
when using the readGAlignmentsList function from the GenomicAlignments package.

Author(s) Martin Morgan and Marc Carlson . returning the result of indexBam applied to the specified path. asMates<. mergeBam Merge several BAM files into a single BAM file. See mergeBam for details. additional arguments supported by mergeBam. returning the result of sortBam applied to the specified path. close.Return or set a logical(0) indicating if the file was sorted by qname. yieldSize<. returning (invisibly) the updated BamFile.Return or set a logical(0) indicating if the records should be returned as mated pairs.Return or set an integer(1) vector indicating yield size. show Compactly display the object. A single file can be filtered to one or several destinations. files. isOpen Tests whether the BamFile con has been opened for reading. Opening / closing: open. sortBam Visit the path in path(file).character-method are also available for BamFileList. returning the information contained in the file header. isIncomplete Tests whether the BamFile con is niether closed nor at the end of the file. Accessors: path Returns a character(1) vector of BAM path names.BamFile Opens the (local or remote) path and index (if bamIndex is not character(0)). The instance may be re-opened with open. index Returns a character(0) or character(1) vector of BAM index path names. Methods: scanBamHeader Visit the path in path(file). quickly returning a data. countBam Visit the path(s) in path(file). see scanBamHeader. seqlength Visit the path in path(file). obeyQname<. returning a Seqinfo. asMates. seqnames. seqinfo. mapped (number of mapped reads on seqnames) and unmapped (number of un- mapped reads). Seqnames are ordered as they appear in the file. scanBam Visit the path in path(file).BamFile.frame with columns seqnames. seqlength. filterBam Visit the path in path(file). obeyQname. returning the result of filterBam applied to the specified path. returning the result of scanBam applied to the specified path. returning the result of countBam applied to the speci- fied path.BamFile 11 Functions and methods BamFileList inherits additional methods from RsamtoolsFileList and SimpleList. indexBam Visit the path in path(file). as described in filterBam. idxstatsBam Visit the index in index(file). yieldSize. character. or named integer vector containing information on the anmes and / or lengths of each sequence.BamFile Closes the BamFile con. Returns a BamFile instance.

"ex1.".system.500 ## Some applications append a filename (e.file("extdata".BamFile(fl)) head(seqlengths(bf)) # sequences and lengths in BAM file if (require(RNAseqData.g. "ex1. scanBam() will iterate ## through the file in chunks. ## fl <. package="Rsamtools".bam"..system. 'qnamePrefixEnd' and ## 'qnameSuffixStart' can be used to trim an unwanted prefix or suffix.HNRNPC. ## This may result in a unique qname for each record which presents a ## problem when mating paired-end reads (identical qnames is one ## criteria for paired-end mating). readGAlignmentPairs. package="Rsamtools") bf <.TRUE ## When 'yieldSize' is set. Examples ## ## BamFile options. See 'asMates' above for details of the pairing ## algorithm.file("extdata".BamFile(fl) bf ## When 'asMates=TRUE' scanBam() reads the data in as ## pairs.chr14)) { bfl <.12 BamFile See Also • The readGAlignments. qnamePrefixEnd(bf) <.HNRNPC. yieldSize(bf) <. mustWork=TRUE) (bf <.bam."/" qnameSuffixStart(bf) <. ## fl <.chr14_BAMFILES) bfl bfl[1:2] # subset bfl[[1]] # select first element -.bam".BamFile ## merged across BAM files seqinfo(bfl) head(seqlengths(bfl)) } length(scanBam(fl)[[1]][[1]]) # all records . and readGAlignmentsList functions de- fined in the GenomicAlignments package." ## ## Reading Bam files. NCBI Sequence Read ## Archive (SRA) toolkit) or allele identifier to the sequence qname. • summarizeOverlaps and findSpliceOverlaps-methods in the GenomicAlignments package for methods that work on a BamFile and BamFileList objects.BamFileList(RNAseqData. asMates(bf) <.bam.

collapse=TRUE) }. .. . param=ScanBamParam()) idxstatsBam(file. .... index. baseOnly=TRUE. Usage scanBam(file.open(BamFile(fl)) sapply(seq_len(length(rng)).length(scanBam(bf)[[1]][[1]])) cat("records:".. IRanges(1.open(BamFile(fl.scanBam(bamFile. "". .) scanBamHeader(files. "". and other operations to manipulate BAM files. c(1575.GRanges(c("seq1". bf <.. file). param=ScanBamParam(what=scanBamWhat())) countBam(file.. indexDestination=TRUE) asSam(file.. rng <. . .. index=file. and merge ‘BAM’ (binary alignment) files. what="seq") bam <.. destination=sub("\. nrec.BamInput 13 bf <. function(i..) ## S4 method for signature 'character' asBam(file.sam(\.gz)?". 1584))) bf <. . "".sam(\. index=file.gz)?". file). "". "seq2").) ## S4 method for signature 'character' scanBamHeader(files. file). "\n") close(bf) ## Repeatedly visit multiple ranges in the BamFile.) asBam(file. destination=sub("\. param=param)[[1]] alphabetFrequency(bam[["seq"]]. overwrite=FALSE. rng) close(bf) BamInput Import. index=file..bam".. yieldSize=1000)) while (nrec <.. count.. rng) { param <.bam". bamFile. file).. bf.ScanBamParam(which=rng[i]. sort. scanBam(fl)) close(bf) ## Use 'yieldSize' to iterate through a file in chunks. Description Import binary ‘BAM’ files into a list structure. .open(BamFile(fl)) # implicit index bf identical(scanBam(bf)... destination=sub("\. with facilities for selecting what fields and which records are imported. filter. . destination=sub("\.) ## S4 method for signature 'character' asSam(file..

filter=FilterRules()... overwrite=FALSE) filterBam(file.. indexDestination=TRUE. byQname=FALSE. .) ## S4 method for signature 'character' sortBam(file. destination.) ## S4 method for signature 'character' filterBam(file. destination. as described below...bam” file suffix. .. For asBam asSam. ...) ## S4 method for signature 'character' mergeBam(files... addRG = FALSE.. destination The character(1) file name of the location where the sorted.. passed to methods.. . . header A character(1) file path for the header information to be used in the merged BAM file. destination. byQname = FALSE.. . addRG A logical(1) indicating whether the file name should be used as RG (read group) tag in the merged BAM file. .. . indexDestination = FALSE) Arguments file The character(1) file name of the ‘BAM’ (’SAM’ for asBam) file to be processed. destination. destination.. this is given without the ’.. index=file. filter A FilterRules instance allowing users to filter BAM files based on arbitrary criteria. region A GRanges() instance with <= 1 elements. overwrite A logical(1) indicating whether the destination can be over-written if it already exists.) mergeBam(files. index The character(1) name of the index file of the ’BAM’ file being processed. maxMemory=512) indexBam(files.. and sortBam this is without the “. must satisfy length(files) >= 2. files The character() file names of the ‘BAM’ file to be processed.bai’ extension.. destination.. byQname A logical(1) indicating whether the sorted destination file should be sorted by Query-name (TRUE) or by mapping position (FALSE). region = GRanges().) ## S4 method for signature 'character' indexBam(files. specifying the region of the BAM files to merged. filtered.. index=file. indexDestination A logical(1) indicating whether the created destination file should also be in- dexed. header = character().. compressLevel1 = FALSE. Additional arguments.. param=ScanBamParam(what=scanBamWhat())) sortBam(file.. .14 BamInput . overwrite = FALSE. or merged output file will be created. For mergeBam. .

Functions in the FilterRules instance should expect a single DataFrame argument representing all information specified by param. see BamFile-class). asBam converts ’SAM’ files to ’BAM’ files. param An instance of ScanBamParam. The original file is then separately filtered into destination[[i]].g. mergeBam merges 2 or more sorted BAM files. "text") can be used to specify which elements of the header are returned. Each function must return a logical vector equal to the number of rows of the DataFrame. Details The scanBam function parses binary BAM files.BamInput 15 compressLevel1 A logical(1) indicating whether the merged BAM file should be compressed to zip level 1. equivalent to samtools view file > destination. filterBam parses records in file. the RG (read group) dictionary in the header of the BAM files is not reconstructed. in which case destination must be a character vector of equal length. These records are then parsed to a DataFrame and made available for further filtering by user-supplied FilterRules. and that the file be named following samtools convention as <bam_filename>. filter may be a list of FilterRules instances. as described below. analogous to the ‘samtools index’ function. using filter[[i]] as the filter criterion. more targets may be present that are represented by reads in the file. The BAM file is created at destination. especially with arguments what to control the fields that are parsed. Records satisfying the bamWhich bamFlag and bamSimpleCigar criteria of param are accumulated to a default of yieldSize = 1000000 records (change this by specifying yieldSize when creating a BamFile instance. and quickly returns the number of mapped and un- mapped reads on each seqname. An index file is created on the destina- tion when indexDestination=TRUE. specifying the components of the BAM records to be returned.and time-efficient to filter using bamWhich. countBam returns a count of records consistent with param.. scanBamHeader visits the header information in a BAM file. if appropriate. ScanBamParam can contain a field what. This requires that the BAM file be indexed. text SAM files can be parsed using R’s scan func- tion. equivalent to samtools view -Sb file > destination. The SAM / BAM specification does not require that the content of the header be consistent with the content of the file. It is more space.bai. sortBam sorts the BAM file given as its first argument. than to supply FilterRules. idxstatsBam visit the index in index(file). returning for each file a list containing elements targets and text. An optional character vector argument containing one or two elements of what=c("targets". ScanBamParam can contain an argument tag to specify which tags will be extracted. Valid values of what are available with scanBamWhat. and bamSimpleCigar. The ’BAM’ file is sorted and an index created on the destination (with extension ’. Return values are used to include (when TRUE) corresponding records in the filtered BAM file. . several salient points are reiterated here. indexBam creates an index for each BAM file specified. e. Details of the ScanBamParam class are provide on its help page. ScanBamParam can contain an argument which that specifies a subset of reads to return. bamFlag. As with samtools. This influences what fields and which records are imported.bai’) when indexDestination=TRUE. analogous to the “samtools sort” function. maxMemory A numerical(1) indicating the maximal amount of memory (in MB) that the function is allowed to use. asSam converts ’BAM’ files to ’SAM’ files.

Each inner list contains elements named after scanBamWhat(). i. • strand: The strand to which the read is aligned.4. and scanBamFlag(). ambiguous. • isize: This is the TLEN field in SAM Spec v1. sortBam returns the file name of the sorted file. scanBamHeader returns a list. • cigar: This is the CIGAR field in SAM Spec v1. • qual: This is the QUAL field in SAM Spec v1. in the 5’ to 3’ orientation..4. unmated) of each record.. • mrnm: This is the RNEXT field in SAM Spec v1. The outer list groups results from each Ranges list of bamWhich(param). seqlength. phred-scaled base quality score. The CIGAR string.4.4. it is the reverse complement of the original sequence. . The name of the reference to which the read is aligned.g. ‘@SQ’) found in the header. This is a factor indicating status (mated.4. • mate_status: Returned (always) when asMates=TRUE in a BamFile object. identifier. • mpos: This is the PNEXT field in SAM Spec v1. • groupid: This is an integer vector of unique group ids returned when asMates=TRUE in a BamFile object. associ- ated with the read. The content of non-null elements are as follows.character-method returns a list of lists. groupid values are used to create the partitioning for a GAlignmentsList object. with one element for each file named in files. • seq: This is the SEQ field in SAM Spec v1. The position to which the mate aligns. even though soft-clipped nucleotides are included in seq. The genomic coordinate at the start of the alignment.. The query sequence. • rname: This is the RNAME field in SAM Spec v1. asBam. idxstatsBam returns a data.4. normally equal to the width of the query returned in seq. and the associated tag values. oriented as seq. The position excludes clipped nucleotides. The MAPping Quality. • pos: This is the POS field in SAM Spec v1. A numeric value summarizing details of the read. The query name. The reference to which the mate (of a paired end or mate pair read) aligns. as calculated from the cigar encoding. mapped (number of mapped reads on seqnames) and unmapped (number of unmapped reads).4. the outer list is of length one when bamWhich(param) has length 0. • flag: This is the FLAG field in SAM Spec v1. The list contains two element. The text element is itself a list with each element a list corresponding to tags (e. • mapq: This is the MAPQ field in SAM Spec v1. If aligned to the minus strand. taken from the description in the samtools API documentation: • qname: This is the QNAME field in SAM Spec v1. • qwidth: The width of the query.e.4.4. and 1- based.4. elements omitted from bamWhat(param) are removed. Inferred insert size for paired end alignments.e. filterBam returns the file name of the destination file created. asSam return the file name of the destination file. See ScanBamParam and the flag argument. indexBam returns the file name of the index file created.16 BamInput Value The scanBam. The targets element contains target (reference) sequence lengths. Phred-encoded.frame with columns seqnames. i. Coordinates are ‘left-most’.4. at the 3’ end of a read on the ’-’ strand.

useNA="always") table(res0[["strand"]]. c(1000. "strand".org>. References http://samtools. param=p1) names(res1) names(res1[[2]]) p2 <.ScanBamParam(what=c("rname". param=p2) p3 <. useNA="always") # query widths derived from cigar table(res0[["cigar"]]. useNA="always") table(res0[["flag"]].at> (sortBam).scanBam(fl)[[1]] # always list-of-lists names(res0) length(res0[["qname"]]) lapply(res0. what=scanBamWhat()) res1 <. head. Thomas Unterhiner <thomas. mustWork=TRUE) ## ## scanBam ## res0 <.bam". 2000))) p1 <. 2000). "pos".file("extdata". "ex1. useNA="always") which <.scanBam(fl. scanBamFlag Examples fl <.scanBam(fl. package="Rsamtools".ScanBamParam(which=which.unterthiner@students. 1000). seq2=IRanges(c(100. scanBamWhat. "qwidth")) res2 <.sourceforge. param=p3)[[1]]$flag) ## ## idxstatsBam ## idxstatsBam(fl) ## ## filterBam ## .BamInput 17 Author(s) Martin Morgan <mtmorgan@fhcrc.net/ See Also ScanBamParam.RangesList(seq1=IRanges(1000.system.ScanBamParam( what="flag".jku. 3) table(width(res0[["seq"]])) # query widths table(res0[["qwidth"]]. # information to query from BAM file flag=scanBamFlag(isMinusStrand=FALSE)) length(scanBam(fl.

tempfile().scanBam(dest.unlist(split(seq_along(gwhich).countBam(fl. param=param. filter=filters) lapply(dest. 1.list( FilterRules(list(long=function(x) width(x$seq) > 35)). .filterBam(fl.g. queried using scanBam) as a single ‘experiment’. param=param) countBam(dest) ## 3271 records ## filter to a single file filter <.filterBam(fl. param=ScanBamParam(which=gwhich)) reorderIdx <. FilterRules(list(short=function(x) width(x$seq) <= 35)) ) destinations <. filter=filter) countBam(dest) ## 398 records res3 <.] BamViews Views into a set of BAM files Description Use BamViews() to reference a set of disk-based BAM files to be processed (e.FilterRules(list(MinWidth = function(x) width(x$seq) > 35)) dest <.sortBam(fl. seqnames(gwhich))) cnt cnt[reorderIdx. bamIndicies=bamPaths. what="seq") dest <. "GRanges")[c(2. tempfile(). tempfile()) dest <. destinations.18 BamViews param <. param=ScanBamParam(what="seq"))[[1]] table(width(res3$seq)) ## filter 1 file to 2 destinations filters <. recover original order ## gwhich <. param=param.. Usage ## Constructor BamViews(bamPaths=character(0).as(which.filterBam(fl. 3)] # example data cnt <.replicate(2. tempfile()) ## ## scanBamParam re-orders 'which'.ScanBamParam( flag=scanBamFlag(isUnmappedQuery=FALSE). countBam) ## ## sortBam ## sorted <.

drop=TRUE] ## Input ## S4 method for signature 'BamViews' scanBam(file.. index = file. . j.. .. . .value bamExperiment(x) ## S4 method for signature 'BamViews' names(x) ## S4 replacement method for signature 'BamViews' names(x) <.. bamRanges.names=make. .. param = ScanBamParam(what=scanBamWhat())) ## S4 method for signature 'BamViews' countBam(file..ANY.) ## S4 method for signature 'missing' BamViews(bamPaths=character(0).value ## S4 method for signature 'BamViews' dimnames(x) ## S4 replacement method for signature 'BamViews. bamIndicies=bamPaths.names=make. drop=TRUE] ## S4 method for signature 'BamViews.. index = file. param = ScanBamParam()) ## Show ## S4 method for signature 'BamViews' show(object) Arguments bamPaths A character() vector of BAM path names. bamExperiment = list()....range=FALSE) ## Accessors bamPaths(x) bamSamples(x) bamSamples(x) <.. . .missing' x[i... . bamExperiment = list()... bamRanges. j.value ## Subset ## S4 method for signature 'BamViews.missing..) <.unique(basename(bamPaths))).value bamRanges(x) bamRanges(x) <. j. containing sample information associated with each path..unique(basename(bamPaths))). without the ‘.. Ranges are not validated against the BAM files. bamSamples A DataFrame instance with as many rows as length(bamPaths).BamViews 19 bamSamples=DataFrame(row. bamIndicies A character() vector of BAM index file path names. drop=TRUE] ## S4 method for signature 'BamViews.ANY' x[i.value bamDirname(x..ANY' x[i. bamRanges Missing or a GRanges instance with ranges defined on the reference chromo- somes of the BAM files. auto.bai’ extension. bamSamples=DataFrame(row.ANY...ANY' dimnames(x) <. ..

object An instance of BamViews. bamRanges<. bamSamples Returns a DataFrame instance with as many rows as length(bamPaths). Objects from the Class Objects are created by calls of the form BamViews(). auto.. bamSamples A DataFrame instance with as many rows as length(bamPaths). i During subsetting. bamRanges A GRanges instance with ranges defined on the spaces of the BAM files. populate the ranges with the union of ranges returned in the target element of scanBamHeader. Ranges are not validated against the BAM files. Ranges are not validated against the BAM files. bamIndicies A character() vector of BAM index path names. containing sample information associated with each path. containing sample information associated with each path. a logical or numeric index into bamSamples and bamPaths. j During subsetting. file An instance of BamViews. ignored by all BamViews subsetting methods. bamSamples<. bamIndicies Returns a character() vector of BAM index path names.range If TRUE and all bamPaths exist. Functions and methods See ’Usage’ for details on invocation.20 BamViews bamExperiment A list() containing additional information about the experiment. Ranges are not validated against the BAM files. a logical or numeric index into bamRanges.Assign a DataFrame instance with as many rows as length(bamPaths). Additional arguments.Assign a GRanges instance with ranges defined on the spaces of the BAM files. bamExperiment A list() containing additional information about the experiment. bamRanges Returns a GRanges instance with ranges defined on the spaces of the BAM files. index A character vector of indices. value An object of appropriate type to replace content. param An optional ScanBamParam instance to further influence scanning or counting.. Accessors: bamPaths Returns a character() vector of BAM path names. contain- ing sample information associated with each path. drop A logical(1). . corresponding to the bamPaths(file). Slots bamPaths A character() vector of BAM path names. . Constructor: BamViews: Returns a BamViews object. x An instance of BamViews.

width=500). file). Count = seq_len(18L)) v <. 9)). ranges = c(IRanges(seq(10000.BamViews(fls.. IRanges(seq(100000. bamRanges(file) takes precedence over bamWhich(param). 11:15).]) bamDirname(v) <. "ex1. names Return the column names of the BamViews instance. Usage ## Constructors BcfFile(file. bamRanges(file) takes precedence over bamWhich(param).Assign the row and column names of the BamViews instance. same as names(bamSamples(x)).] bamRanges(v[c(1:5. show Compactly display the object.BcfFile 21 bamExperiment Returns a list() containing additional information about the experiment. width=5000)). Description Use BcfFile() to create a reference to a BCF (and optionally its index). mode=ifelse(grepl("\. 10000). returning the result of countBam applied to the specified path. 90000. Methods: "[" Subset the object by bamRanges or bamSamples. bamRanges=rngs) v v[1:5. "rb".getwd() v BcfFile Manipulate BCF files. dimnames Return the row and column names of the BamViews instance. countBam Visit each path in bamPaths(file). names<.file("extdata". 900000.Assign the column names of the BamViews instance. Author(s) Martin Morgan Examples fls <.system. The reference remains open across calls to methods. index = file. "r")) BcfFileList(. mustWork=TRUE) rngs <.GRanges(seqnames = Rle(c("chr1". dimnames<.bam". avoiding costly index re-loading. "chr2"). returning the result of scanBam applied to the spec- ified path. scanBam Visit each path in bamPaths(file).) . package="Rsamtools".bcf$". 100000).. c(9. BcfFileList() provides a convenient way of managing a list of BcfFile instances.

mode A character(1) vector.. index() ## S4 method for signature 'BcfFile' isOpen(con. (for indexBcf) an instance of BcfFile point to a BCF file.) ## accessors. also path(). .22 BcfFile ## Opening / closing ## S3 method for class 'BcfFile' open(con. Objects from the Class Objects are created by calls of the form BcfFile(). object An instance of BcfFile.. rw="") bcfMode(object) ## actions ## S4 method for signature 'BcfFile' scanBcfHeader(file. . . param An optional ScanBcfParam instance to further influence scanning. file A character(1) vector of the BCF file path or. Functions and methods BcfFileList inherits methods from RsamtoolsFileList and SimpleList... files. . mode="r" a text (VCF) file.. Returns a BcfFile instance.. Opening / closing: open. rw Mode of file. mode="rb" indicates a binary (BCF) file. this can either be a single character vector of paths to BCF files. For BcfFileList. Additional arguments. Fields The BcfFile class inherits fields from the RsamtoolsFile class. .. ... param=ScanBcfParam()) ## S4 method for signature 'BcfFile' indexBcf(file.. .BcfFile Opens the (local or remote) path and index (if bamIndex is not character(0)).) ## S3 method for class 'BcfFile' close(con..) ## S4 method for signature 'BcfFile' scanBcf(file. or several instances of BcfFile objects.. index A character(1) vector of the BCF index.. ignored.) Arguments con.

or index variant call files in text or binary format.system. IRanges(1. rng) { param <. bf.BcfInput 23 close. "seq2"). "ex1. . c(1575.file("extdata". param=param)[[1]] ## do extensive work with bcf isOpen(bf) ## file remains open }. bcfFile. function(i. param=param) ## all ranges ## ranges one at a time 'bf' open(bf) sapply(seq_len(length(rng)).BcfFile Closes the BcfFile con. 1584))) param <. show Compactly display the object. bcfMode Returns a character(1) vector BCF mode. index Returns a character(1) vector of BCF index name.BcfFile(fl) # implicit index bf identical(scanBcf(bf). returning (invisibly) the updated BcfFile.ScanBcfParam(which=rng) bcf <.bcf". Methods: scanBcf Visit the path in path(file). rng) BcfInput Operations on ‘BCF’ files.BcfFile. coerce. Description Import. mustWork=TRUE) bf <. Accessors: path Returns a character(1) vector of the BCF path name. The instance may be re-opened with open.ScanBcfParam(which=rng) bcf <. package="Rsamtools". Author(s) Martin Morgan Examples fl <.scanBcf(bcfFile. scanBcf(fl)) rng <. returning the result of scanBcf applied to the specified path.GRanges(c("seq1".scanBcf(bf.

overwrite A logical(1) indicating whether the destination can be over-written if it already exists. overwrite=FALSE. . destination The character(1) file name of the location where the BCF output file will be created... For asBcf this is without the “.. ... . e. dictionary. for scanBcfHeader. .g. mode of BcfFile. indexDestination=TRUE) indexBcf(file.. Each element of the list is itself a list containing three elements. indexDestination A logical(1) indicating whether the created destination file should also be in- dexed.24 BcfInput Usage scanBcfHeader(file..) ## S4 method for signature 'character' indexBcf(file.. .. param A instance of ScanBcfParam influencing which records are parsed and the ‘INFO’ and ‘GENO’ information returned.. .. For similar functions operating on VCF files see ?scanVcf in the VariantAnnotation package. dictionary a character vector of the unique “CHROM” names in the VCF file. The reference element is a character() vector with names of reference sequences.character-method.. but requires that the file be BCF or bgzip’d and indexed as a TabixFile. . .) ## S4 method for signature 'character' scanBcf(file. indexDestination=TRUE) ## S4 method for signature 'character' asBcf(file.. The sample element is a character() vector of names of samples.. destination.. param=ScanBcfParam()) asBcf(file.) scanBcf(file.. index The character() file name(s) of the ‘BCF’ index to be processed. with one element for each file named in file. destination..bcf” file suffix. Details bcf* functions are restricted to the GENO fields supported by ‘bcftools’ (see documentation at the url below). . overwrite=FALSE.) ## S4 method for signature 'character' scanBcfHeader(file. dictionary. . Additional arguments. Value scanBcfHeader returns a list. or an instance of class BcfFile. index = file.... the character() file name of the ‘BCF’ file to be processed... The argument param allows portions of the file to be input.) Arguments file For scanBcf and scanBcfHeader.

See Also BcfFile. sub("\. REF. http://samtools.gz$".bcf". with elements corresponding to fields supported by ‘bcftools’ (see documentation at the url below).rz". scanBcf returns a list. corresponding to the columns of the VCF specification: CHROM.Compression 25 The header element is a character() vector of the header lines (preceeded by “##”) present in the VCF file. Usage bgzip(file. "".sourceforge. dest=sprintf("%s.sourceforge. package="Rsamtools". with one element per file.org>.shtml contains information on the portion of the specification implemented by bcftools.system.file("extdata". "". asBcf creates a binary BCF file from a text VCF file.bgz".net/ provides information on samtools. Author(s) Martin Morgan <mtmorgan@fhcrc.net/specs.gz$". ALTQUAL. POS. FILTER. file)). FORMAT. overwrite = FALSE) razip(file. overwrite = FALSE) . mustWork=TRUE) scanBcfHeader(fl) bcf <. file)).html outlines the VCF specification. ID. INFO.sourceforge. dest=sprintf("%s. References http://vcftools. "ex1. The GENO element is itself a list. GENO. TabixFile Examples fl <. Each list has 9 elements.scanBcf(fl) ## value: list-of-lists str(bcf[1:8]) names(bcf[["GENO"]]) str(head(bcf[["GENO"]][["PL"]])) example(BcfFile) Compression File compression for tabix (bgzip) and fasta (razip) files. http://samtools. indexBcf creates an index into the BCF file.net/mpileup. sub("\. Description These functions compress files for use in other parts of Rsamtools: bgzip for tabix files. razip for random-access fasta files.

FaFile. "ex1. Value The full path to dest. Details For yieldTabix. This file will be compressed.org>. This will be the compressed file. mustWork=TRUE) to <. to) deprecated Deprecated functions Description Functions listed on this page are no longer supported.system. Author(s) Martin Morgan <mtmorgan@fhcrc. use REDUCEsampler with reduceByYield in the GenomicFiles package.net/ See Also TabixFile.sam". For BamSampler.26 deprecated Arguments file A character(1) path to an existing uncompressed or gz-compressed file.bgzip(from. If dest exists. Author(s) Martin Morgan <mtmorgan@fhcrc. use the yieldSize argument of TabixFiles. if it already exists. . Examples from <.file("extdata". package="Rsamtools". overwrite A logical(1) indicating whether dest should be over-written. then it is only over-written when overwrite=TRUE.tempfile() zipped <.org> References http://samtools.sourceforge. dest A character(1) path to a file.

.FaFile 27 FaFile Manipulate indexed fasta files. "GRanges")) ## S4 method for signature 'FaFile' seqinfo(x) ## S4 method for signature 'FaFile' countFa(file. as=c("DNAStringSet".. . .. . index() ## S4 method for signature 'FaFile' isOpen(con.GRanges' scanFa(file. Usage ## Constructors FaFile(file.fai". Description Use FaFile() to create a reference to an indexed fasta file..) ## Opening / closing ## S3 method for class 'FaFile' open(con. . index=sprintf("%s.....RangesList' scanFa(file.. . "RNAStringSet".... FaFileList() provides a convenient way of managing a list of FaFile instances. rw="") ## actions ## S4 method for signature 'FaFile' indexFa(file. file).) ## S4 method for signature 'FaFile' scanFaIndex(file... as=c("GRangesList". The reference remains open across calls to methods. . param.. "AAStringSet")) ## S4 method for signature 'FaFile.) ## S3 method for class 'FaFile' close(con. also path(). param. avoiding costly index re-loading..... ..) FaFileList(..) ## accessors. .) ## S4 method for signature 'FaFileList' scanFaIndex(file.. ..) ## S4 method for signature 'FaFile..

or an instance of class FaFile or FaFileList (for scanFaIndex. close..missing-method this can include arguemnts to readDNAStringSet / readRNAStringSet / readAAStringSet when param is ‘missing’. default DNAStringSet. ignored.. param. with index information from each file is returned as an element of the list. Objects from the Class Objects are created by calls of the form FaFile(). as A character(1) vector indicating the type of object to return..) ## S4 method for signature 'FaFileList' getSeq(x.fai’ extension). The instance may be re-opened with open. .missing' scanFa(file. file. param An optional GRanges or RangesList instance to select reads (and sub-sequences) for input. x An instance of FaFile or (for getSeq) FaFileList. rw Mode of file. or several instances of FaFile objects. this can either be a single character vector of paths to BAM files. Additional arguments.FaFile Closes the FaFile con. Returns a FaFile instance. Accessors: path Returns a character(1) vector of the fasta path name. getSeq). . Functions and methods FaFileList inherits methods from RsamtoolsFileList and SimpleList. below. "RNAStringSet".FaFile Opens the (local or remote) path and index files. • For FaFileList.) Arguments con. See Methods. index Returns a character(1) vector of fasta index name (minus the ’... . "RNAStringSet". • For scanFa. • For scanFaIndex.. . index A character(1) vector of the fasta or fasta index file path (for FaFile). as=c("DNAStringSet". • For scanFa. Fields The FaFile class inherits fields from the RsamtoolsFile class. GRangesList.. param.. ..FaFile. Opening / closing: open.28 FaFile as=c("DNAStringSet". index information is collapsed across files into the unique index elements. "AAStringSet")) ## S4 method for signature 'FaFile' getSeq(x.FaFile. default GRangesList. returning (invisibly) the updated FaFile. param. "AAStringSet")) ## S4 method for signature 'FaFile.

Author(s) Martin Morgan Examples fl <. Description Scan indexed fasta (or compressed fasta) files and their indicies. the return type is a DNAStringSet.GRanges methods differ in that getSeq will reverse complement sequences selected from the minus strand. show Compactly display the object. countFa Return the number of records in the fasta file. use seqlengths(FaFile(file)) to discover sequence widths.FaFile.system. package="Rsamtools".FaInput 29 Methods: indexFa Visit the path in path(file) and create an index file (with the extension ‘. -10) # last 10 nucleotides (dna <. scanFa Return the sequences indicated by param as a DNAStringSet instance.file("extdata". When length(param)==0 no records are selected. all records are selected. start(param) and end{param} define the (1-based) region of the sequence to return. For the FaFile method. .open(FaFile(fl)) # open countFa(fa) (idx <.scanFa(fa. seqinfo Consult the index file for defined sequences (seqlevels()) and lengths (seqlengths()).scanFa(fa.FaFile and scanFa. creating a one-to-one mapping between the ith element of file and the ith element of param. The getSeq. the param argument must be a GRangesList of the same length as file. with elements of the list in the same order as the input elements.fai’). param=idx[1:2])) FaInput Operations on indexed ’fasta’ files. "ce2dict1. getSeq Returns the sequences indicated by param from the indexed fasta file(s) of file.fa". param=idx[1:2])) ranges(idx) <. Values of end(param) greater than the width of the sequence cause an error.narrow(ranges(idx).scanFaIndex(fa)) (dna <. return- ing the information as a GRanges object. mustWork=TRUE) fa <. the return type is a SimpleList of DNAStringSet instances. seqnames(param) selects the sequences to return. For the FaFileList method. scanFaIndex Read the sequence names and and widths of recorded in an indexed fasta file. When param is missing.

scanFaIndex reads the sequence names and and widths of recorded in an indexed fasta file. "RNAStringSet"... When param is GRanges(). Additional arguments. . as A character(1) vector indicating the type of object to return.) ## S4 method for signature 'character' indexFa(file.. param An optional GRanges or RangesList instance to select reads (and sub-sequences) for input. param. When param is missing. "AAStringSet")) ## S4 method for signature 'character. "RNAStringSet".. .. .. no records are selected.. . param. all records are selected. as=c("DNAStringSet". RNAStringSet... . scanFa return the sequences indicated by param as a DNAStringSet. AAStringSet instance. as=c("DNAStringSet"...RangesList' scanFa(file. .. as=c("DNAStringSet". "RNAStringSet". . .) ## S4 method for signature 'character' countFa(file.. start(param) and end{param} define the (1-based) region of the sequence to return. Values of end(param) greater than the width of the sequence are set to the width of the sequence.. "AAStringSet")) ## S4 method for signature 'character.) scanFa(file. "AAStringSet")) Arguments file A character(1) vector containing the fasta file path. as=c("DNAStringSet". seqnames(param) selects the sequences to return.. countFa returns the number of records in the fasta file..... param. "RNAStringSet". param... ..30 FaInput Usage indexFa(file..GRanges' scanFa(file. . .fai’)..) scanFaIndex(file..) countFa(file. default DNAStringSet.missing' scanFa(file. . Value indexFa visits the path in file and create an index file at the same location but with extension ‘.. "AAStringSet")) ## S4 method for signature 'character. passed to readDNAStringSet / readRNAStringSet / readAAStringSet when param is ‘missing’. return- ing the information as a GRanges object.) ## S4 method for signature 'character' scanFaIndex(file.

currently ignored.org>. mustWork=TRUE) countFa(fa) (idx <.. package="Rsamtools".gz".net/ provides information on samtools.system.) Arguments file A character(1) file path or TabixFile instance pointing to a ‘tabix’ file.gtf. Description This function queries a tabix file. idx[1:2])) ranges(idx) <. .system.. column indicies used to sort the file. References http://samtools.scanFa(fa. Examples fa <. returning the names of the ‘sequences’ used as a key when creating the file.. and the comment character used while indexing.scanFa(fa.scanFaIndex(fa)) (dna <.narrow(ranges(idx). the number of lines skipped while indexing... Usage headerTabix(file. Author(s) Martin Morgan <mtmorgan@fhcrc. -10) # last 10 nucleotides (dna <. idx[1:2])) headerTabix Retrieve sequence names defined in a tabix file.sourceforge. package="Rsamtools".fa".headerTabix 31 Author(s) Martin Morgan <mtmorgan@fhcrc.. "ce2dict1. mustWork=TRUE) headerTabix(fl) . Additional arguments.org>.file("extdata". Examples fl <. "example. Value A list(4) of the sequence names.) ## S4 method for signature 'character' headerTabix(file. . .file("extdata".

. Author(s) Martin Morgan <mtmorgan@fhcrc. "bed". Usage indexTabix(file. start indicates the column containing the start coordinate of the feature to be indexed. zeroBased=FALSE. from indexing.net/tabix. start and end position ordering. "psltbl"). "vcf4". "vcf". . comment="#". "sam". when present as the first character in a line. Description Index (with indexTabix) files that have been sorted into ascending sequence. start=integer(). seq=integer(). A characater(1) matching one of the types named in the function signature. format=c("gff".shtml . end=integer(). zeroBased A logical(1) indicating whether coordinats in the file are zero-based. format The format of the data in the compressed file. skip The number of lines to be skipped at the beginning of the file. bgzip-compressed file.. then seq indicates the column in which the ‘sequence’ identifier (e. indicates that the line is to be omitted.. Value The return value of indexTabix is an updated instance of file reflecting the newly-created index file.org>. start If format is missing. skip=0L. chrq) is to be found. Additional arguments.) Arguments file A characater(1) path to a sorted.32 indexTabix indexTabix Compress and index tabix-compatible files. end indicates the column containing the ending coordinate of the feature to be indexed.. seq If format is missing.g. comment A single character which.sourceforge... References http://samtools. end If format is missing.

index=file.bgzip(from.5+")) Arguments file character(1) or BamFile. see scanBam. perhaps used by methods.file("extdata". idx) pileup Use filters and output formats to calculate pile-up statistics for a BAM file.TabixFile(zipped. "sam") tab <. maximum number of overlapping alignments considered for each position in the pileup.frame with columns summarizing counts of reads overlapping each genomic position. pileupParam=PileupParam()) ## PileupParam constructor PileupParam(max_depth=250. BAM file path. ignore_query_Ns=TRUE. "ex1. min_nucleotide_depth=1. scanBamParam An instance of ScanBamParam. Use phred2ASCIIOffset to help translate numeric or character values to these off- sets. scanBamParam=ScanBamParam(). strand.8+". "Illumina 1. distinguish_nucleotides=TRUE. "Illumina 1. optionally differentiated on nucleotide. package="Rsamtools". "Sanger". mustWork=TRUE) to <. min_mapq=0.3+". Description pileup uses PileupParam and ScanBamParam objects to calculate pileup statistics for a BAM file.. "Solexa". an optional character(1) of BAM index file path. index When file is character(1). and position within read. pileupParam An instance of PileupParam. minimum ‘QUAL’ value for each nucleotide in an alignment. Usage pileup(file. distinguish_strands=TRUE. max_depth integer(1). .indexTabix(zipped. min_base_quality integer(1). Additional arguments. scheme= c("Illumina 1. The result is a data.pileup 33 Examples from <.tempfile() zipped <. left_bins=NULL.sam".. to) idx <. . min_base_quality=13. .... min_minor_allele_depth=0. include_deletions=TRUE.system. cycle_bins=NULL) phred2ASCIIOffset(phred=integer(). query_bins=NULL. include_insertions=FALSE.

all same sign. cycle_bins DEPRECATED. minimum count of all nucleotides other than the major allele at a given position required for a particular nucleotide to appear in the result. Floating-point values are coerced to integer. Sorted order not required. TRUE to include insertions in pileup. Minimum of two values are required so at least one range can be formed. TRUE if result should differentiate between nucleotides. all same sign.34 pileup min_mapq integer(1).8+’ scheme) or character() of printable ASCII characters. or of any length where all elements have exactly 1 character. ignore_query_Ns logical(1). Use negative values to count from the opposite end. pileup visits each position in the BAM file. discard later cycles. phred Either a numeric() (coerced to integer()) PHRED score (e. See left_bins for identical behavior. include_deletions logical(1). unique positions within a read to delimit bins. NULL (default) indicates no binning.. query_bins numeric. Use negative values to count from the opposite end. all reads overlapping the position that are consistent with constraints dictated by flag of scanBamParam and pileupParam are used for counting. scheme Encoding scheme. min_nucleotide_depth integer(1). Position within read is based on counting from the 5’ end regardless of strand. pileup also automatically excludes reads with flags that indicate any of: • unmapped alignment (isUnmappedQuery) . If you want to delimit bins based on sequencing cycles to. Ignored when phred is already character(). distinguish_nucleotides logical(1).g. counting from the 3’ end when read aligns to minus strand. 41) for the ‘Illumina 1. Floating-point values are coerced to integer. left_bins numeric. Position within a read is based on counting from the 5’ end when the read aligns to plus strand. distinguish_strands logical(1). For a given position. NULL (default) indicates no binning. used to translate numeric() PHRED scores to their ASCII code. TRUE if non-determinate nucleotides in alignments should be ex- cluded from the pileup. include_insertions logical(1). Sorted order not required. minimum ‘MAPQ’ value for an alignment to be included in pileup. TRUE if result should differentiate between strands. query_bins probably gives the desired behavior. and use of left_bins and query_bins assumes familiarity with cut.. e.g. Details NB: Use of pileup assumes familiarity with ScanBamParam. minimum count of each nucleotide (independent of other nucleotides) at a given position required for said nucleotide to appear in the result. in the range (0. unique positions within a read to delimit bins. TRUE to include deletions in pileup. Minimum of two values are required so at least one range can be formed. min_minor_allele_depth integer(1). When phred is character(). it can be of length 1 with 1 or more characters. subject to constraints implied by which and flag of scanBamParam.

Soft clip operations contribute to the relative position of nucleotides within a read. but the distinction of on which strand the nucleotide appears is ignored. such that a match with the reference genome is counted separately in the results if distinguish_nucleotides is TRUE.frame. . The cycle position of the A nucleotide will be 3. If include_deletions is TRUE and distinguish_nucleotides is TRUE. behavior reflects that of scanBam: the entire file is visited and counted. If you are confused about how the different CIGAR operations interact. distinguish_nucleotides and distinguish_strands when set to TRUE each add a column (nucleotide and strand. CIGAR support: pileup handles the extended CIGAR format by only heeding operations that con- tribute to counts (‘M’.pileup 35 • secondary alignment (isSecondaryAlignment) • not passing quality controls (isNotPassingQualityControls) • PCR or optical duplicate (isDuplicate) If no which argument is supplied to the ScanBamParam. with a CIGAR 2S1M and SEQ field ttA. ‘N’. Setting only one to TRUE behaves as expected. see the SAM versions of the BAM files used for pileup unit testing for a fairly compre- hensive human-readable treatment. ignore_query_Ns determines how ambiguous nucloetides are treated. if ignore_query_Ns is FALSE and distinguish_nucleotides is TRUE the ‘N’ nucleotide value appears in the nucleotide column when a base at a given position is ambiguous. A simple illustration is ignore_query_Ns and distinguish_nucleotides. but the reported position will be 11. further. as mentioned in the ignore_query_Ns section. am- biguous nucleotides are included in counts. CIGAR ‘N’ operations (not to be confused with ‘N’ used for non-determinate bases in the SEQ field) indicate a large skipped region. each nucleotide at a given position is counted separately. consider a sequence in a bam file that starts at 11. the nucleotide column uses a ‘-’ character to indicate a deletion at a position. If using left_bins or query_bins it might make sense to first preprocess your files to remove soft clipping. For example. • Deletions and clipping: The extended CIGAR allows a number of operations conceptually similar to a ‘deletion’ with respect to the reference genome. ‘I’). By default. See more details below. and parameters that affect filtering behavior and the schema of the results (‘behavior+structure’ policies). each com- bination of nucleotide and strand at a given position is counted separately. The ‘=’ nucleotide code in the SEQ field (to mean ‘identical to reference genome’) is supported by pileup. but offer more specialized meanings than a simple deletion. Parameters for pileupParam belong to two categories: parameters that affect only the filtering criteria (so-called ‘behavior-only’ policies). deletions with respect to the reference genome to which the reads were aligned are included in the counts in a pileup. and ‘H’ a hard clip. ‘S’. ‘D’. If both are TRUE. If ignore_query_Ns is FALSE. ‘S’ a soft clip. Use a yieldSize value when creating a BamFile instance to manage memory consumption when using pileup with large BAM files. By default. whereas hard clip and refskip operations do not. respectively) to the resulting data. There are some differences in how pileup behavior is affected when the yieldSize value is set on the BAM file. for example. and ‘H’ CIGAR operations are never counted: only counts of true deletions (‘D’ in the CIGAR) can be included by setting include_deletions to TRUE. Many of the parameters of the pileupParam interact. if only nucleotide is set to TRUE. ambiguous nu- cleotides (typically ‘N’ in BAM files) in alignments are ignored.

g. sensitive to strand). left_bins: BAM files store sequence data from the 5’ end to the 3’ end (regardless of to which strand the read aligns).g. – From effective start of reads: use positive values. this is because accuracy typically degrades in later cycles. specify the lower bound of a bin by using one less than the desired value. That means for reads aligned to the minus strand. Information about e. The most common use of a bin argument is to bin later cycles separately from earlier cycles. For query_bin the effective start is the first cycle of the read as it came off the sequencer. ‘P’ CIGAR (padding) operations never affect counts. query_bins yields the correct behavior because bin positions are relative to cycle order (i. left_bins (in contrast with query_bins) determines bin positions from the 5’ end.last].. If a “bin” argument is set pileup automatically excludes bases outside the implicit outer range. use -Inf as the lower bound on a bin that extends to the beginning of all reads.last]. that means the 5’ end for reads aligned to the plus strand and 3’ for reads aligned to the minus strand.. and (+)Inf. regardless of strand. Here are some points to keep in mind when using pileup in conjunction with yieldSize: . Because cut produces ranges in the form (first. Take care to use the appropriate bin argument for your use case. counts of the ‘+’ nucleotide participate in voting for min_nucleotide_depth and min_minor_allele_depth functionality. Here are some important points regarding bin arguments: • query_bins vs. Because ‘+’ is used as a nucleotide code to represent insertions in pileup. e. – From effective end of reads: use negative values and -Inf. -1 denotes the last position in a read. pileup relies on cut to derive bins. For left_bin the effective start is always the 5’ end (left-to-right as appear in the BAM file). ‘0’ should be used to create a bin that includes the first position. The “bin” arguments query_bins and left_bins allow users to differentiate pileup counts based on arbitrary non-overlapping regions within a read. but limits input to numeric values of the same sign (coerced to integers). To account for variable-length reads in BAM files. plus that of the non-insertion base) at a given position if the non-insertion base passes filter criteria. Details about insertions are effectively truncated such that each insertion is reduced to a single ‘+’ nucleotide. a bin that captures only the second nucleotide from the last position: query_bins=c(-3. use (+)Inf as the upper bound on a bin that extends to the end of all reads. cycle position within a read is effectively reversed. including +/-Inf. • Positive or negative bin values can be used to delmit bins based on the effective “start” or “end” of a read.e. To account for variable- length reads in BAM files. The position of an insertion is the position that precedes it on the reference sequence. Note: this means if include_insertions is TRUE a single read will contribute two nucleotide codes (+. -2). For this application. the nucleotide code and base quality within the insertion is not included. pileup obeys yieldSize on BamFile objects with some differences from how scanBam responds to yieldSize. Because cut produces ranges in the form (first. 0.36 pileup • Insertions and padding: pileup can include counts of insertion operations by setting include_insertions to TRUE.

pileup 37 • BamFiles with a yieldSize value set. phred2ASCIIOffset returns a named integer vector of the same length or number of characters in phred. nulceotide. yieldSize only limits the number of alignments processed from the BAM file each time the file is read. should not be used with other functions that accept a BamFile instance as input. PileupParam returns an instance of PileupParam class. • pos: integer(1). such that two queries of the same genomic region can be differentiated. extracted from SAM ‘SEQ’ field of alignment. The names are the ASCII encoding. Column details: • seqnames: factor. pileup will hold in reserve positions for which only partial data has been processed. Genomic position of base. See scanBam for more detailed explanation of SAM fields. once opened and used with pileup. Derived by offset from SAM ‘POS’ field of alignment. If a which argument is specified for the scanBamParam. • left_bin / query_bin: factor.org . Bin in which base appears. and left_bin / query_bin columns appear as dictated by arguments to PileupParam. • nucleotide: factor. as indicated by values of other columns. The which_label column is used to maintain grouping of results. ‘nucleotide’ of base. then non-decreasing order on columns pos.frame pileup will repeatedly process yieldSize additional records until at least one position can be returned to the user. • Like scanBam.frame is first by order of seqnames column based on the BAM index file. Create a new BamFile instance instead of trying to reuse.frame returned from pileup is not 1-to-1. Positions held in reserve will be returned upon subsequent calls to pileup when all the input for a given position has been processed. optional strand. Order of rows in data. with frequency (count) of unique combination. then nucleotide. SAM ‘RNAME’ field of alignment. • strand: factor. ‘strand’ to which read aligns. pileup uses an empty return object (a zero-row data. Value For pileup a data. a which_label column (factor in the form ‘rname:start-end’) is automatically included. then strand. and the values are the offsets to be used with min_base_quality. In order to avoid returning a zero-row data. Author(s) Nathaniel Hayden nhayden@fredhutch. • The correspondence between yieldSize and the number of rows in the data. • count: integer(1). • Sometimes reading yieldSize records from the BAM file does not result in any completed positions. Columns seqnames.frame) to indicate end- of-input.frame with 1 row per unique combination of differentiating column values that satisfied filter criteria. pos. and count always appear. Frequency of combination of differentiating fields. • pileup only returns genomic positions for which all input has been processed. Only a handful of alignments can lead to many distinct records in the result. then left_bin / query_bin.

pileupParam=p_param) head(res) table(res$strand. The more detailed examples ## of pileup use queries with a 'which' argument.RNAseqData.pileup(fl.bam.HNRNPC.frame") if(nrow(res) == 0L) break } close(bf) ## pileup for the first half of chr14 with default Pileup parameters ## 'which' argument: process only specified genomic range(s) sbp <. org/wiki/FASTQ_format#Encoding.chr14") fl <.pileup(fl) head(res) table(res$strand. scanBamParam=sbp. 53674770))) res <. library("RNAseqData. scanBamParam=sbp) head(res) table(res$strand.open(BamFile(fl. res$nucleotide) ## specify pileup parameters: include ambiguious nucleotides ## (the 'N' level in the nucleotide column of the data. IRanges(1.HNRNPC. the entire-file examples are ## included first to make them easy to find. see https://en.38 pileup See Also • Rsamtools • BamFile • ScanBamParam • scanBam • cut For the relationship between PHRED scores and ASCII encoding.pileup(fl. res$nucleotide) .frame) p_param <. yieldSize=5e4)) repeat { res <.pileup(bf) message(nrow(res).ScanBamParam(which=GRanges("chr14".bam.PileupParam(ignore_query_Ns=FALSE) res <.chr14_BAMFILES[1] ## no 'which' argument to ScanBamParam: process entire file at once res <.wikipedia. " rows in result data. Examples ## The examples below apply equally to pileup queries for specific ## genomic ranges (specified by the 'which' parameter of 'ScanBamParam') ## and queries across entire files. res$nucleotide) ## no 'which' argument to ScanBamParam with 'yieldSize' set on BamFile bf <.

pch=". only include bases in ## cycle positions 6-10 within read p_param <.. "Illumina 1. pileupParam=p_param) ## basic use of negative bins: single pivot. scanBamParam=sbp. min_base_quality=10.PileupParam(query_bins=c(0. res)) ## query_bins ## basic use of positive bins: single pivot.") ## ASCII offsets to help specify min_base_quality. distinguish_strands=FALSE. res$strand) ## Any combination of the filtering criteria is possible: let's say we ## want a "coverage pileup" that only counts reads with mapping ## quality >= 13. 10)) res <. pileupParam=p_param) head(res) res <. pileupParam=p_param) ## basic use of negative bins: drop nucleotides in last two cycle . -1)) res <.PileupParam(query_bins=c(-Inf. pileupParam=p_param) dim(res) ## reduced to our biologically interesting positions head(xtabs(count ~ pos + nucleotide. min_nucleotide_depth=4) res <. min_nucleotide_depth=7) res <.PileupParam(query_bins=c(5. Inf)) res <.PileupParam(min_minor_allele_depth=5. pileupParam=p_param) ## basic use of positive bins: simple range. quality of at ## least 10 on the Illumina 1. scanBamParam=sbp. and bases with quality >= 10 that appear at least 4 ## times at each genomic position p_param <. res. 10.PileupParam(distinguish_strands=TRUE. "count")] ## drop seqnames and which_label cols plot(count ~ pos. pileupParam=p_param) head(res) table(res$nucleotide. scanBamParam=sbp.3+ scale phred2ASCIIOffset(10. p_param <.pileup(fl. scanBamParam=sbp. distinguish_strand=FALSE) res <.pileup(fl. require a minimum frequency of 7 for a ## nucleotide at a genomic position to be included in results.pileup 39 ## Don't distinguish strand.pileup(fl.pileup(fl. p_param <. scanBamParam=sbp.pileup(fl. e. scanBamParam=sbp. -4. min_mapq=13.res[.g. count bases that appear in ## first 10 cycles of a read separately from the rest p_param <.PileupParam(distinguish_nucleotides=FALSE.pileup(fl.3+") ## Well-supported polymorphic positions (minor allele present in at ## least 5 alignments) with high map quality p_param <. min_mapq=40. count bases that appear in ## last 3 cycle positions of a read separately from the rest. c("pos".

PileupParam(query_bins=seq(1.20)) res <. Usage ## Constructors PileupFiles(files.pileup(fl.PileupParam(query_bins=c(0. scanBamParam=sbp. 18:19)]) PileupFiles Represent BAM files for pileup summaries. 18:19)]) ## fit on one screen print(t(t(xt) / colSums(xt))[ . and end segements. res) print(xt[ . res) print(xt) t(t(xt) / colSums(xt)) ## single-width bins: count each cycle within a read separately. scanBamParam=sbp. Hence. all bases ## outside of that are dropped.. . res) print(xt) t(t(xt) / colSums(xt)) ## cheap normalization for illustrative purposes ## 'implicit outer range': in contrast to c(0.PileupParam(query_bins=c(3. pileupParam=p_param) xt <. pileupParam=p_param) ## typical use: beginning. 16)) res <. p_param <. . 8-12 middle. param=ApplyPileupsParam()) . pileupParam=p_param) xt <.40 PileupFiles ## positions of each read p_param <. 12.pileup(fl. Inf)) ## note non-symmetric res <.xtabs(count ~ nucleotide + query_bin. -3)) res <. 12. c(1:3. 7. Inf). 12. p_param <. 7. p_param <.16].. >=13 end (actual cycle positions should be ## tailored to data in actual BAM files). middle. it is common for bases in the ## beginning and end segments of each read to be less reliable.. pileup ## makes it easy to count each segment separately. pileupParam=p_param) xt <.xtabs(count ~ nucleotide + query_bin. the implicit outer range is (3.pileup(fl. c(1:3. but know ## that the first three cycles and any bases beyond the 16th cycle are ## irrelevant. to be used for calcu- lating pile-up summaries. scanBamParam=sbp. because of the ## nature of sequencing technology.pileup(fl. Description Use PileupFiles() to create a reference to a BAM files (and their indicies). 7. and end segments.. middle. ## Assume determined ahead of time that for the data 1-7 makes sense for ## beginning.xtabs(count ~ nucleotide + query_bin. param=ApplyPileupsParam()) ## S4 method for signature 'character' PileupFiles(files.PileupParam(query_bins=c(-Inf. scanBamParam=sbp... suppose we ## still want to have beginning.

. It has the following fields: files A list of BamFile instances..) ## S3 method for class 'PileupFiles' close(con. rw character() indicating mode of file. object An instance of PileupFiles. Objects from the Class Objects are created by calls of the form PileupFiles().. param An instance of ApplyPileupsParam... .. Additional arguments. also path() ## S4 method for signature 'PileupFiles' isOpen(con. FUN. All elements of . Fields The PileupFiles class is implemented as an S4 reference class. param=ApplyPileupsParam()) ## opening / closing ## S3 method for class 'PileupFiles' open(con. FUN.... must be the same type. Using a list of BamFile allows indicies to be specified when these are in non-standard format.. param An instance of ApplyPileupsParam. not used for TabixFile. and which summary information to return. FUN A function of one argument. . a character() or list of BamFile instances representing files to be included in the pileup.. param) ## S4 method for signature 'PileupFiles. .ApplyPileupsParam' applyPileups(files. .. param) ## display ## S4 method for signature 'PileupFiles' show(object) Arguments files For PileupFiles.) ## accessors... . ... rw="") plpFiles(object) plpParam(object) ## actions ## S4 method for signature 'PileupFiles. a PileupFiles instance. For applyPileups.PileupFiles 41 ## S4 method for signature 'list' PileupFiles(files. con.PileupFiles-method.. currently ignored. . see applyPileups.missing' applyPileups(files. to select which records to include in the pileup.

. The instance may be re-opened with open..only=FALSE) ## S4 method for signature 'character' quickBamFlagSummary(file. plpParam Returns the ApplyPileupsParam content of the PileupFiles instance. main. close. Author(s) Martin Morgan Examples example(applyPileups) quickBamFlagSummary Group the records of a BAM file based on their flag bits and count the number of records in each group Description quickBamFlagSummary groups the records of a BAM file based on their flag bits and counts the number of records in each group..PileupFiles Closes each file in the PileupFiles instance..groups... Accessors: plpFiles Returns the list of the files in the PileupFiles instance. invoking FUN on each range or collection of posi- tions. show Compactly display the object. . Usage quickBamFlagSummary(file..only=FALSE) . main. param=ScanBamParam().PileupFiles Opens the (local or remote) path and index of each file in the PileupFiles instance. .PileupFiles. returning (invisibly) the updated PileupFiles... param=ScanBamParam(). param=ScanBamParam(). Methods: applyPileups Calculate the pileup across all files in files according to criteria in param (or plpParam(files) if param is missing). main.groups.groups. index=file..only=FALSE) ## S4 method for signature 'list' quickBamFlagSummary(file. Returns a PileupFiles instance. See applyPileups.42 quickBamFlagSummary Functions and methods Opening / closing: open.

file("extdata". This determines which records are considered in the counting.. then the counting is performed for the main groups only.groups.. Examples bamfile <. mustWork=TRUE) quickBamFlagSummary(bamfile) readPileup Import samtools ’pileup’ files.readPileup 43 Arguments file. A summary of the counts is printed to the console unless redirected by sink. . . perhaps used by methods. and to the index file of the BAM file to read. file is a named list with elements “qname” and “flag” with content as from scanBam. variant=c("SNP".) ## S4 method for signature 'connection' readPileup(file. param An instance of ScanBamParam. Pages References http://samtools. main. . "indel". index For the character method.net/ See Also scanBam. package="Rsamtools". BamFile for a method for that class. Additional arguments. ScanBamParam. the path to the BAM file to read. respectively. "ex1.only If TRUE.. Value Nothing is returned.sourceforge.. Description Import files created by evaluation of samtools’ pileup -cv command. Author(s) H. For the list() method..bam".. "all")) . Usage readPileup(file.system..

bam" p <.readPileup(p.net/ Examples fl <. consen- susQuality..file("extdata". of the pileup output file to be parsed. and arguments passed to read..] ## Not run: ## uses a pipe. additionalIndels The number of additional indels present. nrow=100. consensusBase. consensus. variant Type of variant to parse. snpQuality.character-method.readPileup(fl)) xtabs(~referenceBase + consensusBase. passed to methods. variant="indel") all <. For instance. maxMappingQuality. or connection. Additional arguments. alleleOneSupport. The value returned by variant="indel" contains space. nrow=100) # variant="SNP" indel <. mustWork=TRUE) (res <. "r") snp <.readPileup(p. alleleTwoSupport The number of reads supporting each allele. alleleTwo The first (typically. mcols(res))[DNA_BASES."samtools pileup -cvf human_b36_female. reference. snpQuality: The phred-scaled SNP quality (probability of the consensus being identical to the reference).sourceforge. Value readPileup returns a GRanges object. "pileup. as determined by samtools pileup. Author(s) Sean Davis References http://samtools. coverage: The number of reads covering the site. and coverage fields. specify variant for the readPileup. position.44 readPileup Arguments file The file name.fa. in the reference sequence) and second allelic variants.gz na19240_3M. and: alleleOne.system.txt". maxMappingQuality: The root mean square mapping quality of reads overlapping the site. The consensus nucleotide. select one. variant="all") . package="Rsamtools". . referenceBase: The nucleotide in the reference sequence. The value returned by variant="SNP" or variant="all" contains: space: The chromosome names (fastq ids) of the reference sequence position: The nucleotide position (base 1) of the variant.table ## three successive piles of 100 records each cmd <.pipe(cmd.readPileup(p. consensusQuality: The phred-scaled consensus quality. nrow=100.

... . yieldSize An integer(1) vector of the number of records to yield. It has the following fields: .. see.. it is not intended for direct use by users – see... ..) yieldSize(object.g. unused. Additional arguments. Objects from the Class Users do not directly create instances of this class. . Usage ## accessors ## S4 method for signature 'RsamtoolsFile' path(object.) <.RsamtoolsFile 45 ## End(Not run) RsamtoolsFile A base class for managing file references in Rsamtools Description RsamtoolsFile is a base class for managing file references in Rsamtools. value Replacement value. asNA = TRUE) ## S4 method for signature 'RsamtoolsFile' isOpen(con.) ## S4 method for signature 'RsamtoolsFile' index(object. Fields The RsamtoolsFile class is implemented as an S4 reference class.g.extptr An externalptr initialized to an internal structure with opened bam file and bam index pointers.. ignored. BamFile-class. . rw="") ## S4 method for signature 'RsamtoolsFile' yieldSize(object. object An instance of a class derived from RsamtoolsFile. index A character(1) vector of the index file name.. BamFile. . path A character(1) vector of the file name.value ## S4 method for signature 'RsamtoolsFile' show(object) Arguments con. asNA logical indicating if missing output should be NA or character() rw Mode of file.... e. e. .

. asNA = TRUE) ## S4 method for signature 'RsamtoolsFileList' isOpen(con.Return or set an integer(1) vector indicating yield size. Additional arguments..) ## S3 method for class 'RsamtoolsFileList' close(con.46 RsamtoolsFileList Functions and methods Accessors: path Returns a character(1) vector of path names. . .) ## S4 method for signature 'RsamtoolsFileList' names(x) ## S4 method for signature 'RsamtoolsFileList' yieldSize(object. Methods: isOpen Report whether the file is currently open. . ..) ## S4 method for signature 'RsamtoolsFileList' index(object... BamFileList... ..) Arguments con. ignored. index Returns a character(1) vector of index path names. .. e. asNA logical indicating if missing output should be NA or character() rw Mode of file. Author(s) Martin Morgan RsamtoolsFileList A base class for managing lists of Rsamtools file references Description RsamtoolsFileList is a base class for managing lists of file references in Rsamtools.g.. x An instance of a class derived from RsamtoolsFileList.... yieldSize<. yieldSize.. . it is not intended for direct use – see. rw="") ## S3 method for class 'RsamtoolsFileList' open(con. show Compactly display the object. object. Usage ## S4 method for signature 'RsamtoolsFileList' path(object.

isNotPrimaryRead = NA. if names are NULL.value bamReverseComplement(object) bamReverseComplement(object) <. isDuplicate = NA) scanBamWhat() # Accessors bamFlag(object. simpleCigar = FALSE. isProperPair = NA. Author(s) Martin Morgan ScanBamParam Parameters for scanning BAM files Description Use ScanBamParam() to create a parameter object influencing what fields and which records are imported from a (binary) BAM file.bai) exists. Use of which requires that a BAM index file (<filename>. BamFileList-class.ScanBamParam 47 Objects from the Class Users do not directly create instances of this class. isMinusStrand = NA. e.g. asInteger=FALSE) bamFlag(object) <. close: Attempt to close each file in the list. which. see. isSecondMateRead = NA. open: Attempt to open each file in the list. isUnmappedQuery = NA. isSecondaryAlignment = NA.. names: Names of each element of the list or. updating. Usage # Constructor ScanBamParam(flag = scanBamFlag(). isMateMinusStrand = NA. mapqFilter=NA_integer_) # Constructor helpers scanBamFlag(isPaired = NA. Methods: isOpen: Report whether each file in the list is currently open. tag = character(0). the basename of the path of each element. hasUnmappedMate = NA. isNotPassingQualityControls = NA. reverseComplement = FALSE.value . tagFilter = list(). what = character(0). isFirstMateRead = NA. Functions and methods This class inherits functions and methods for subseting. and display from the SimpleList class.

NAs. Tags are identified by two-letter codes. The name of each list element is the tag name (two-letter code).value ## S4 method for signature 'ScanBamParam' show(object) # Flag utils bamFlagAsBitMatrix(flag. so all elements of tag must have exactly 2 characters. This might be useful if wishing to recover read and quality scores as represented in fastq files. BAM files store reads mapping to the minus strand as though they are on the plus strand. For bamFlagAsBitMatrix. bamFlagTest an integer vector where each element represents a ’flag’ entry. stored with each record. Only reads with specified tags are included. and the corresponding atomic vector is the set of ac- ceptable values for the tag. tag A character vector naming tags to be extracted. what A character vector naming the fields to return scanBamWhat() returns a vector of available fields. but when this value is set to TRUE returns the sequence and quality scores of reads mapped to the minus strand in the re- verse complement (sequence) and reverse (quality) of the read as stored in the BAM file.value bamWhat(object) bamWhat(object) <. but is NOT appropriate for variant calling or other alignment-based operations.value bamTagFilter(object) bamTagFilter(object) <. Fields are described on the scanBam help page. and empty strings are not allowed in the atomic vectors. an integer(2) vector used to filter reads based on their ’flag’ entry. NULLs. returns only those reads for which the cigar (run-length encoded representation of the alignment) is missing or contains only matches / mismatches ('M'). with arbitrary information. flag2) bamFlagTest(flag. Rsamtools obeys this convention by de- fault (reverseComplement=FALSE). reverseComplement A logical(1) vectors. BAM records with mapping qualities less than mapqFilter are discarded. bitnames=FLAG_BITNAMES) bamFlagAND(flag1.value bamTag(object) bamTag(object) <. simpleCigar A logical(1) vector which. mapqFilter A non-negative integer(1) specifying the minimum mapping quality to include. . A tag is an optional field.48 ScanBamParam bamSimpleCigar(object) bamSimpleCigar(object) <.value bamMapqFilter(object) bamMapqFilter(object) <. when TRUE. tagFilter A named list of atomic vectors.value bamWhich(object) bamWhich(object) <. This is most easily created with the scanBamFlag() helper function. value) Arguments flag For ScanBamParam.

A properly paired read is de- fined by the alignment algorithm and might. from which a IRangesList instance will be constructed. A non-primary alignment (“secondary alignment” in the SAM specification) might result when a read aligns to multiple locations. Names of the IRangesList correspond to reference sequences. or any (NA) strand should be returned. minus (TRUE). for bamFlagTest.. or any (NA) read should be returned. which does not include strand informa- tion (use the flag argument instead). or any (NA) read should be returned. or any (NA) read should be returned. from the ar- guments to scanBamFlag. Only records with a read overlapping the specified ranges are returned. . RangesList. isPaired A logical(1) indicating whether unpaired (FALSE).g. and ranges to the regions on that reference sequence for which matches are desired. isNotPassingQualityControls A logical(1) indicating whether reads passing quality controls (FALSE). isUnmappedQuery A logical(1) indicating whether unmapped (TRUE). duplicated (TRUE). properly paired (TRUE). isDuplicate A logical(1) indicating that un-duplicated (FALSE). unmapped (TRUE). mapped (FALSE).ScanBamParam 49 which A GRanges. or any (NA) strand should be returned. isNotPrimaryRead Deprecated. use isSecondaryAlignment. to be assigned to object or. isFirstMateRead A logical(1) indicating whether the first mate read should be returned (TRUE) or not (FALSE). reads not passing quality controls (TRUE). for which this flag is TRUE. e. isMinusStrand A logical(1) indicating whether reads aligned to the plus (FALSE). or any (NA) mate should be returned. or any object that can be coerced to a RangesList. or whether mate read number should be ignored (NA). or any (NA) reads should be returned. object An instance of class ScanBamParam. isProperPair A logical(1) indicating whether improperly paired (FALSE). are designated by the aligner as secondary. isSecondMateRead A logical(1) indicating whether the second mate read should be returned (TRUE) or not (FALSE).. Because data types are coerced to IRangesList. represent reads aligning to identical reference sequences and with a specified distance. are not (FALSE) or whose secondary status does not matter (NA) should be returned. or whether mate read number should be ignored (NA). All ranges must have ends less than or equal to 536870912. paired (TRUE). a character(1) name of the flag to test. value An instance of the corresponding slot.g. hasUnmappedMate A logical(1) indicating whether reads with mapped (FALSE). or missing object. the remainder. e. “isUnmappedQuery”. minus (TRUE). or any (NA) read should be returned. ’Duplicated’ reads may represent PCR or optical duplicates. isSecondaryAlignment A logical(1) indicating whether alignments that are secondary (TRUE). One alignment is designated as primary and has this flag set to FALSE. isMateMinusStrand A logical(1) indicating whether mate reads aligned to the plus (FALSE).

and the set of acceptable values for each tag. Slots flag Object of class integer encoding flags to be kept when they have their ’0’ (keep0) or ’1’ (keep1) bit set. bamReverseComplement<. bamTagFilter. bamWhat<. bamWhat. flag1. that only ’simple’ cigars (empty or ’M’) are returned.Returns or sets a named list of tags to filter by. bamFlag<. bamTagFilter<. Objects from the Class Objects are created by calls of the form ScanBamParam(). bamTag<. bamReverseComplement. and the set of their acceptable values.Returns or sets a logical(1) vector in- dicating whether reads on the minus strand will be returned with sequence reverse comple- mented and quality reversed. reverseComplement Object of class logical indicating. that reads on the minus strand are to be reverse complemented (sequence) and reversed (quality). . bamSimpleCigar. bitnames Names of the flag bits to extract. tag Object of class character indicating what tags are to be returned.Returns or sets an integer(2) representation of reads flagged to be kept or excluded. bamSimpleCigar<. Methods: show Compactly display the object. when TRUE. bamFlag.Returns or sets a logical(1) vector indicating whether reads without indels or clipping be kept. The which argument to the constructor can be one of several different types. Will be the colnames of the returned matrix.50 ScanBamParam asInteger logical(1) indicating whether ‘flag’ should be returned as an encoded integer vector (TRUE) or human-readable form (FALSE). what Object of class character indicating what fields are to be returned. which Object of class RangesList indicating which reference sequence and coordinate reads must overlap. Constructor: ScanBamParam: Returns a ScanBamParam object. bamWhich. Functions and methods See ’Usage’ for details on invocation.Returns or sets a character vector of tags to be extracted. Accessors: bamTag. flag2 Integer vectors containing ‘flag’ entries. bamWhich<. or NA to indicate no filtering.Returns or sets a character vector of fields to be extracted. as documented above.Returns or sets a RangesList of bounds on reads to be extracted. A length 0 RangesList represents all reads. mapqFilter Object of class integer indicating the minimum mapping quality required for input. when TRUE. tagFilter Object of class list (named) indicating tags to filter by. simpleCigar Object of class logical indicating.

isMinusStrand=TRUE) p6 <. c(1000.ScanBamParam(tag=c("NM".ScanBamParam(what="flag") bam6 <.RangesList(seq1=IRanges(1000. "H1").ScanBcfParam-class 51 Author(s) Martin Morgan See Also scanBam Examples ## defaults p0 <.ScanBamParam(tag=c("NM".ScanBamParam(what=scanBamWhat().ScanBamParam(what=c("rname". 2000))) p1 <.ScanBamParam() ## subset of reads based on genomic coordinates which <. head) ## tags.scanBamFlag(isUnmappedQuery=FALSE.ScanBamParam(what=scanBamWhat(). seq2=IRanges(c(100.scanBam(fl. param=p4) str(bam4[[1]][["tag"]]) ## tagFilter p5 <. NM: edit distance. "H1").scanBam(fl. "qwidth")) ## use fl <. param=p5) table(bam5[[1]][["tag"]][["NM"]]) ## flag utils flag <. 2000). what="flag") bam4 <. tagFilter=list(NM=c(2.scanBam(fl. which=which) ## subset of reads based on 'flag' value p2 <. 4))) bam5 <. flag=scanBamFlag(isMinusStrand=FALSE)) ## subset of fields p3 <. "ex1. H1: 1-difference hits p4 <. param=p6) flag6 <. package="Rsamtools".scanBam(fl.bam". 1000). "pos". "strand". 3.bam6[[1]][["flag"]] head(bamFlagAsBitMatrix(flag6[1:9])) colSums(bamFlagAsBitMatrix(flag6)) flag bamFlagAsBitMatrix(flag) ScanBcfParam-class Parameters for scanning BCF files . mustWork=TRUE) res <.system. param=p2)[[1]] lapply(res.file("extdata".

. Usage ScanBcfParam(fixed=character().. samples=character().. object An instance of class ScanBcfParam. which An object.) ## S4 method for signature 'RangesList' ScanBcfParam(fixed=character(). trimEmpty=TRUE. samples=character(). samples=character(). trimEmpty=TRUE. geno=character(). samples=character(). trimEmpty A logical(1) indicating whether ‘GENO’ fields with no values should be re- turned. geno=character(). Variants whose POS lies in the interval(s) [start. end) are returned..bci) exists. above).) ## Accessors bcfFixed(object) bcfInfo(object) bcfGeno(object) bcfSamples(object) bcfTrimEmpty(object) bcfWhich(object) Arguments fixed A logical(1) for use with ScanVcfParam only. describing the sequences and ranges to be queried. ..52 ScanBcfParam-class Description Use ScanBcfParam() to create a parameter object influencing the ‘INFO’ and ‘GENO’ fields parsed.. NA_character_ returns none. trimEmpty=TRUE.. geno A character() vector of ‘GENO’ fields (see scanVcfHeader) to be returned.. . for which a method is defined (see usage. geno=character(). samples A character() vector of sample names (see scanVcfHeader) to be returned. info A character() vector of ‘INFO’ fields (see scanVcfHeader) to be returned. which. which.. info=character(). NA_character_ returns none. character(0) returns all fields. info=character().) ## S4 method for signature 'GRanges' ScanBcfParam(fixed=character().. Use of which requires that a BCF index file (<filename>. . geno=character().. . . geno=character(). which. trimEmpty=TRUE.. info=character(). Arguments used internally. samples=character(). which.) ## S4 method for signature 'missing' ScanBcfParam(fixed=character(). character(0) returns all fields.. which. and which sample records are imported from a BCF file. trimEmpty=TRUE. info=character(). info=character().) ## S4 method for signature 'GRangesList' ScanBcfParam(fixed=character(). .

fixed: Object of class "character". For use with ScanVcfParam only. as documented above. Author(s) Martin Morgan mtmorgan@fhcrc. bcfTrimEmpty. Constructor: ScanBcfParam: Returns a ScanBcfParam object. Slots which: Object of class "RangesList" indicating which reference sequence and coordinate variants must overlap. trimEmpty: Object of class "logical" indicating whether empty ‘GENO’ fields are to be returned. info: Object of class "character" indicating portions of ‘INFO’ to be returned. bcfWhich: Return the corresponding field from object.org See Also scanVcf ScanVcfParam Examples ## see ?ScanVcfParam examples . Accessors: bcfInfo. bcfGeno. Methods: show Compactly display the object. samples: Object of class "character" indicating the samples to be returned.ScanBcfParam-class 53 Objects from the Class Objects can be created by calls of the form ScanBcfParam(). The which argument to the constructor can be one of several types. geno: Object of class "character" indicating portions of ‘GENO’ to be returned. Functions and methods See ’Usage’ for details on invocation.

"example. the reference remains open across calls to methods. ... .. returning the names of the ‘sequences’ used as a key when creating the file. package="Rsamtools". .54 TabixFile seqnamesTabix Retrieve sequence names defined in a tabix file. TabixFileList() provides a convenient way of managing a list of TabixFile instances. Value A character() vector of sequence names present in the file.org>. ..file("extdata". Usage seqnamesTabix(file.system.) ## S4 method for signature 'character' seqnamesTabix(file. currently ignored. Author(s) Martin Morgan <mtmorgan@fhcrc.. Description Use TabixFile() to create a reference to a Tabix file (and its index).) Arguments file A character(1) file path or TabixFile instance pointing to a ‘tabix’ file. avoiding costly index re-loading. Additional arguments. Once opened. Description This function queries a tabix file.gz". mustWork=TRUE) seqnamesTabix(fl) TabixFile Manipulate tabix indexed tab-delimited files. Examples fl <.gtf..

yieldSize() ## S4 method for signature 'TabixFile' isOpen(con... ... param) ## S4 method for signature 'character..missing' scanTabix(file.. .. a character(1) or TabixFile instance... For others. .) ## S3 method for class 'TabixFile' close(con. Only valid when param is unspecified.. can be remote (http://.) ## S4 method for signature 'TabixFile. include NA. . index = paste(file. index A character(1) vector of the tabix file index.) ## Opening / closing ## S3 method for class 'TabixFile' open(con. "tbi". . .."). when creating a TabixFileList from TabixFile instances.GRanges' scanTabix(file.TabixFile 55 Usage ## Constructors TabixFile(file... param) ## S4 method for signature 'TabixFile. file For TabixFile(). used to select which records to scan.. ftp://).ANY' scanTabix(file. a TabixFile instance. yieldSize=NA_integer_) TabixFileList(. yieldSize Number of records to yield each time the file is read from using scanTabix. also path()..... index().. . rw="") ## actions ## S4 method for signature 'TabixFile' seqnamesTabix(file.. A character(1) vector to the tabix file path.missing' scanTabix(file.. sep=".) ## accessors. param An instance of GRanges or RangesList. . For countTabix.....RangesList' scanTabix(file. param) ## S4 method for signature 'TabixFile.. yieldSize does not alter existing yield sizes. . . .. param) ## S4 method for signature 'character.) Arguments con An instance of TabixFile. .. param) countTabix(file.) ## S4 method for signature 'TabixFile' headerTabix(file...

close. yieldSize. index. to index tab- separated files. returning the sequence names present in the file. NA indi- cates that all records are to be parsed. Functions and methods TabixFileList inherits methods from RsamtoolsFileList and SimpleList. and yieldSize arguments. and the header (preceeded by comment character. this can include file. show Compactly display the object. The arguments can also include yieldSize. Opening / closing: open. See indexTabix. Methods: seqnamesTabix Visit the path in path(file). returning the sequence names. The in- stance may be re-opened with open. Visit the path in path(file). returning the re- sult of scanTabix applied to the specified path. headerTabix Visit the path in path(file).TabixFile. Fields The TabixFile class inherits fields from the RsamtoolsFile class. call the corresponding method after coercing file to TabixFile. not used for TabixFile.. For signature(file="character"). Objects from the Class Objects are created by calls of the form TabixFile(). yieldSize determines the number of records parsed during each call to scanTabix. Accessors: path Returns a character(1) vector of the tabix path name. Additional arguments. scanTabix For signature(file="TabixFile"). ap- plied to all elements of the list. rw character() indicating mode of file. yieldSize<.TabixFile Closes the TabixFile con. The file can be a single character vector of paths to tabix files (and optionally a similarly lengthed vector of index). returning (invisibly) the updated TabixFile.Return or set an integer(1) vector indicating yield size. rather than TabixFile objects. index Returns a character(1) vector of tabix index name.TabixFile Opens the (local or remote) path and index. or the count of all records in the file (when param is missing). countTabix Return the number of records in each range of param. the number of lines skipped while indexing. Returns a TabixFile instance.. at start of file) lines. indexTabix This method operates on file paths. Author(s) Martin Morgan . the comment character used while indexing. column indicies used to sort the file.56 TabixFile . or several in- stances of TabixFile objects. For TabixFileList.

.frame's dff <. .file("extdata". param) ## S4 method for signature 'character.gz". res) dff[["chr1:1-100000"]][1:5. tab-delimited) files.open(TabixFile(fl..csv(textConnection(elt). mustWork=TRUE) tbx <. param=param) sapply(res..Map(function(elt) { read.system.scanTabix(tbx)[[1]])) cat("records read:".GRanges' scanTabix(file. yieldSize=100)) while(length(res <. 1). .GRanges(c("chr1". length(res). "chr2").gtf.TabixInput 57 Examples fl <.. tab-delimited files. IRanges(c(1. tabix-indexed.. or more flexibly an instance of class TabixFile.RangesList' scanTabix(file. width=100000)) countTabix(tbx) countTabix(tbx. package="Rsamtools". param) Arguments file The character() file name(s) of the tabix file be processed.scanTabix(tbx. . sep="\t".. header=FALSE) }. param) ## S4 method for signature 'character. .. sorted. Usage scanTabix(file. "\n") close(tbx) TabixInput Operations on ‘tabix’ (indexed..TabixFile(fl) param <... Description Scan compressed.1:8] ## parse 100 records at a time length(scanTabix(tbx)[[1]]) # total number of records tbx <. length) res[["chr1:1-100000"]][1:2] ## parse to list of data... "example. currently ignored. Additional arguments. param=param) res <. param A instance of GRanges or RangesList providing the sequence names and re- gions to be parsed.

report the occurrence of paired-end reads to the user.g. with one element per region.. 8. Author(s) Martin Morgan <mtmorgan@fhcrc. when path points to a plain text file but index is present.. scanTabix_io A read error occured while inputing the tabix file. 4. index=file. Additional arguments. BGZF_ERR_HEADER.org>.sourceforge.net/tabix. Value A logical vector of length 1 containing TRUE is returned if BAM file contained paired end reads. Usage testPairedEndBam(file. or a BamFile instance. FALSE otherwise. BGZF_ERR_MISUSE.shtml Examples example(TabixFile) testPairedEndBam Quickly test if a BAM file has paired end reads Description Iterate through a BAM file until a paired-end read is encountered or the end of file is reached. index (optional) character(1) name of the index file of the ’BAM’ file being processed. This might be because the file is corrupt. currently unused. Each element of the list is a character vector representing records in the region.. implying that path should be a bgziped file. References http://samtools. this is given without the ’.bai’ extension. Open BamFiles are closed.. BGZF_ERR_ZLIB. . . The error message may include an error code representing the logical OR of these cryptic signals: 1. 2. or of incorrect format (e.) Arguments file character(1) BAM file name.58 testPairedEndBam Value scanTabix returns a list.. The following errors are signaled: scanTabix_param yieldSize(file) must be NA when more than one range is specified. their yield size is respected when iterating through the file. BGZF_ERR_IO. . Error scanTabix signals errors using signalCondition.

testPairedEndBam 59 Author(s) Martin Morgan mailto:mtmorgan@fhcrc. Sonali Arora mailto:sarora@fhcrc.org Examples fl <.bam".org. package="Rsamtools") testPairedEndBam(fl) .system.file("extdata". "ex1.

47 [.(BamFile). 26 BamSampler-class (deprecated).missing-method bamPaths (BamViews).BamFile-method (BamFile).missing-method bamSamples<.(ScanBamParam).(ScanBamParam). 23 ApplyPileupsParam. 7 FaFile. 21 asMates. 5 asBcf. 47 indexTabix. 30 47 applyPileups. 47 asSam (BamInput). 7 BcfFile. AAStringSet.(BamViews). 7 RsamtoolsFile. 7. 13 BamFile. 7 headerTabix. 26 applyPileups. 3. 42 BamSampler (deprecated).(ScanBamParam). 13 Rsamtools-package. 23 BamFile. 43.character-method (BamInput). 15. 2 bamMapqFilter (ScanBamParam). 40 bamSimpleCigar (ScanBamParam). 12. 25 BamFileList. 18 (BamViews). 5. 42 bamFlagAND (ScanBamParam). 7. 7 FaInput.BamFile-method (BamFile). 54 ∗Topic manip bamDirname<.BamFileList-method (BamFile). 45 asMates<-. 23 BamFile-class (BamFile). 57 bamIndicies (BamViews). 7 asMates (BamFile). 18 bamMapqFilter<.character-method (BcfInput). 26 BamFileList (BamFile).(ScanBamParam).(ScanBamParam). 46 7 ScanBamParam. 18 bamRanges (BamViews). RsamtoolsFileList. 18 asMates.ANY-method (BamViews).(ScanBamParam). 5 bamTag<. 47 asBam. 46. 47 bamReverseComplement<. 47 (ApplyPileupsParam). 47 deprecated. 7 PileupFiles.(BamViews). 47 [.Index ∗Topic classes asBcf (BcfInput). 18 bamReverseComplement (ScanBamParam). 47 ApplyPileupsParam. 32 bamFlag<. 18 [. 47 quickBamFlagSummary. 40 bamSamples (BamViews). 3 bamExperiment (BamViews). 47 TabixInput. 47 asBam (BamInput).BamViews. 33. 27 asMates<. 18 applyPileups. 31 bamFlag (ScanBamParam).BamViews.BamViews. 47 readPileup.PileupFiles. 58 BcfInput. 41. 12. 43 bamFlagAsBitMatrix (ScanBamParam). 13 TabixFile. 7 Compression.BamFileList-method (BamFile).PileupFiles. 47 60 . 13 ScanBcfParam-class. 45. 18 ∗Topic package BamInput. 4. 47 seqnamesTabix. 41. 29 BamFileList-class (BamFile). 41. 42 bamSimpleCigar<. 18 (PileupFiles). 18 applyPileups. 13 bamTagFilter<. 18 (BamViews). 38. 54 bamFlagTest (ScanBamParam).character-method (BamInput).missing.ApplyPileupsParam-method (PileupFiles). 9.ANY.(BamViews). 47 ApplyPileupsParam-class bamTag (ScanBamParam). 13 bamTagFilter (ScanBamParam). 18 BamInput.ANY-method bamRanges<. 40 asMates<-. 51 asSam. 7 BamViews.ANY.

RsamtoolsFileList-method deprecated. 46 (TabixFile).GRanges-method (BamViews). 26 (RsamtoolsFileList). 2 close. 51 getSeq. 29 index. 31 close. 19.character-method (BamInput).FaFile-method (FaFile). 47 18 bamWhich (ScanBamParam). 18 dim.(RsamtoolsFile). 29.BamViews-method (BamViews). 21 findSpliceOverlaps-methods.FaFile-method (FaFile). 11 idxstatsBam.TabixFile (TabixFile).character-method (BamInput). 27 index. 29 BcfFile-class (BcfFile). 49 bzfile-class (Rsamtools-package).missing-method (BamViews).GRanges-method FaFile. 18 dimnames.PileupFiles (PileupFiles).ANY-method DNAStringSet. 33 countBam. 13 countBam.(ScanBamParam). 44. 24. 7 include_deletions (pileup). 45 DataFrame.character-method close. 21.INDEX 61 BamViews.character-method countBam (BamInput). 14 bcfMode (BcfFile). 21 filterBam (BamInput). 45 countFa (FaInput). 47 distinguish_strands (pileup). 2 BcfFileList (BcfFile).BamFile-method (BamFile). 18 (BamViews). 54 (RsamtoolsFileList). 45 cycle_bins (pileup). 28–30. 23 FilterRules. 21 fifo-class (Rsamtools-package). 27 bgzip (Compression).BamFileList-method (BamFile). 13 connection. 25 GRanges.TabixFile-method (RsamtoolsFileList). 33 countBam. 44 idxstatsBam. 54 close. 33 countBam. 47 bamWhich<-. 10. 30 (ScanBamParam). 7 hasUnmappedMate (ScanBamParam). 47 bcfTrimEmpty (ScanBcfParam-class). bcfInfo (ScanBcfParam-class). 26.ScanBamParam. 18 BamViews. 33 bamWhich<-.RsamtoolsFile-method countFa.FaFile (FaFile).ScanBamParam. bamWhat<. 27 (ScanBamParam).BcfFile (BcfFile). 7 ignore_query_Ns (pileup).BamViews-method (BamViews). 13 bcfFixed (ScanBcfParam-class).RsamtoolsFileList headerTabix. 31 close. 18 bamWhat (ScanBamParam). 21 headerTabix. 13 (BamInput). 45 countFa. 2 gzfile-class (Rsamtools-package). 29 (RsamtoolsFile). 27 bcfWhich (ScanBcfParam-class). 20.BamViews-method (BamViews). 13 index (RsamtoolsFile). 33 bamWhich<. 51 getSeq.BamViews-method (BamViews). 18 BamViews. 19. 27 (ScanBamParam).ANY-method BamViews-class (BamViews). 47 FaFileList-class (FaFile). 51 filterBam.BamViews.FaFileList-method (FaFile). 51 13 BcfInput. 27 bamWhich<-.RsamtoolsFileList-method countTabix (TabixFile). 40 (headerTabix).RangesList-method FaFileList (FaFile).ScanBamParam. 34.RsamtoolsFile-method (RsamtoolsFile). 18 dimnames<-. 47 dimnames<-.(ScanBamParam). 46 . 25 FaInput. 7 countBam. 11 BcfFileList-class (BcfFile).BamFile-method (BamFile). 47 distinguish_nucleotides (pileup).character-method (FaInput). 7 bcfGeno (ScanBcfParam-class). 27 BcfFile. 20 index<-. 27 headerTabix. 38 index<. 51 FLAG_BITNAMES (ScanBamParam). 33 index<-. 54 Compression.BamFile-method (BamFile). 36. 12 bcfSamples (ScanBcfParam-class). 18 include_insertions (pileup). 51 filterBam. 46 cut. 25 idxstatsBam (BamInput).BamFile (BamFile). 47 close. 21 filterBam. 47 FaFile-class (FaFile).

5 max_depth (pileup).BamFile-method (BamFile).BcfFile-method (BcfFile). 47 PileupFiles. 45 isOpen.RsamtoolsFile-method pileup. 5 min_mapq (pileup). 13 (RsamtoolsFileList).BamFile-method (BamFile).62 INDEX indexBam. 40 phred2ASCIIOffset (pileup).(ApplyPileupsParam).(BamFile).BcfFile-method (BcfFile). 13 indexBcf (BcfInput). 29 obeyQname<. 23 obeyQname. 3. 21 (RsamtoolsFile).FaFile-method (FaFile). 40 isProperPair (ScanBamParam). 47 isNotPrimaryRead (ScanBamParam). 35 (RsamtoolsFileList).character-method (BamInput). 7 isDuplicate (ScanBamParam).character-method isPaired (ScanBamParam). 5 plpFlag<. 7 indexFa. 33 pipe-class (Rsamtools-package). 45 pileup. isSecondaryAlignment. 33 plpMinMapQuality<. 5 mergeBam (BamInput).RsamtoolsFileList-method pileup.(ApplyPileupsParam).BamFileList-method (BamFile). 7 indexBcf. 33 plpFlag (ApplyPileupsParam). 56 obeyQname<-.BamFile-method (BamFile).character-method (BcfInput). 46 PileupFiles.PileupFiles-method (RsamtoolsFileList). 33 isOpen.BamFileList-method (BamFile). 33 isUnmappedQuery.FaFile-method (FaFile). 33 (RsamtoolsFileList). 33 plpMaxDepth (ApplyPileupsParam). 46 isNotPassingQualityControls open.RsamtoolsFile-method isOpen. 27 path.RsamtoolsFileList isNotPassingQualityControls. 40 isSecondMateRead (ScanBamParam).TabixFile (TabixFile). 33 plpMinDepth<. 33 (RsamtoolsFile).BamViews-method (BamViews).(ApplyPileupsParam). 23 obeyQname (BamFile). 47 open. 5 names.FaFile (FaFile). 13 plpMinBaseQuality (ApplyPileupsParam). 7 open. 5 min_nucleotide_depth (pileup). 7 plpMinBaseQuality<- mergeBam. 2 left_bins. 47 path (RsamtoolsFile).RsamtoolsFileList-method isOpen. 18 indexBam.BamFileList-method isDuplicate. 40 left_bins (pileup). 54 PileupFiles. 13 (ApplyPileupsParam).BamFile-method (BamFile).character-method (FaInput). 47 (PileupFiles). 21 obeyQname. 54 (ScanBamParam). 35 40 isSecondaryAlignment (ScanBamParam). 40 isMinusStrand (ScanBamParam).PileupFiles (PileupFiles). 21 isIncomplete. 34 plpFiles (PileupFiles).BamFile-method (BamFile). 7 names<-. 11 plpMaxDepth<. 47 open. 47 PileupParam. 27 isMateMinusStrand (ScanBamParam). 35 (BamFile). 7 isFirstMateRead (ScanBamParam). 7 indexTabix. 18 plpParam (PileupFiles).character-method (BamInput). 27 obeyQname<-. 7 indexBcf. 33 plpMinMapQuality (ApplyPileupsParam).BamFile-method (pileup). 33 plpMinDepth (ApplyPileupsParam). 5 min_base_quality (pileup). 5 min_minor_allele_depth (pileup).list-method (PileupFiles). 45 isOpen. 40 isOpen. 47 open. 5 mergeBam. 29 7 indexFa. indexFa (FaInput). 5 mergeBam. 11 names.BamViews-method (BamViews). 40 . 33 isOpen. 32. 47 PileupFiles-class (PileupFiles).(ApplyPileupsParam). 33 isUnmappedQuery (ScanBamParam).RsamtoolsFileList-method indexBam (BamInput). 34 PileupParam (pileup). 7 path.BcfFile (BcfFile). 47 PileupParam-class (pileup). 47 open.character-method (pileup). 46 (PileupFiles).TabixFile-method (TabixFile).BamFile (BamFile). 46 indexBam.

43 ScanBcfParam (ScanBcfParam-class).character-method ScanBcfParam. 38. 5 RsamtoolsFileList. 47 qnameSuffixStart<.GRangesList-method Rsamtools. 45. (BamFile).BamFile-method (ScanBamParam). 7 ScanBamParam-class (ScanBamParam). 7 scanBam. 25 scanBcfHeader (BcfInput). 47 query_bins. 42 scanBamWhat (ScanBamParam).BamFileList-method (BamFile). 10. 7 qnameSuffixStart<-.INDEX 63 plpWhat (ApplyPileupsParam). 33.BamFile-method ScanBamParam. 56 plpYieldAll<. 15. 13 qnameSuffixStart. 5 scanBam. 5 Rsamtools-package. 11. 12 21 readGAlignmentsList. 18 (BamFile). 38 (ScanBcfParam-class). 5 (RsamtoolsFileList). qnameSuffixStart (BamFile). 7 26 qnamePrefixEnd. 28.character-method scanBamWhat. 10. 23 (quickBamFlagSummary).BamSampler-method (deprecated).ANY-method (ScanBamParam). 7.BcfFile-method (BcfFile). 2 plpWhich (ApplyPileupsParam). 47 quickBamFlagSummary.BamFile-method scanBam.(ApplyPileupsParam). 17 qnamePrefixEnd<-. 7 ScanBamParam. 51 . 7 scanBam. 11. 48. 7 7 qnameSuffixStart. 23 readPileup. 30 ScanBcfParam. 2. 33 (ScanBamParam). 18.GRanges-method (readPileup).BamFileList-method scanBam. 34. 12 scanBcfHeader. 13 (BamFile). 12 scanBcfHeader. (BamFile). 28.character-method (BamFile). 34 ScanBamParam.BamViews-method (BamViews). 22. 7 47 qnameSuffixStart<-. 24 (readPileup). 47 quickBamFlagSummary.(ApplyPileupsParam).BamFile-method (BamFile). 56 plpWhich<.GRanges-method (BamFile). 5.character-method (BamInput). 10. 42 scanBcf (BcfInput). 37. 43 (BcfInput). 2 plpWhat<. 47 (BamFile). 51 scanBam (BamInput). 13 qnamePrefixEnd<.BamFile-method scanBamHeader. 47 (BamFile). 17 (quickBamFlagSummary). 7 scanBamHeader. 7 (ScanBamParam). 46 plpYieldBy<. 5 RsamtoolsFile-class (RsamtoolsFile). 42 ScanBamParam. 5 scan.character-method readPileup. 43. 43 (ScanBcfParam-class). 7 (BamInput).BamFile-method scanBamFlag (ScanBamParam). 11 qnamePrefixEnd<-.connection-method ScanBcfParam.(ApplyPileupsParam). 10.(BamFile).(BamFile). 22.BamFile-method (BamFile).character-method (BcfInput). 51 RNAStringSet. 5 RsamtoolsFile. 5 Rsamtools (Rsamtools-package).BamFileList-method scanBamHeader (BamInput). 43.list-method scanBcf. 7 scanBamHeader. 51 readPileup.BcfFile-method (BcfFile).RangesList-method quickBamFlagSummary. 20.(ApplyPileupsParam). readGAlignments. 7 qnamePrefixEnd.missing-method query_bins (pileup). 5 plpYieldSize (ApplyPileupsParam). 49 scanBcf. 21 RangesList. 46. 30. 38. 7 scanBamFlag.(ApplyPileupsParam). 33. 47 quickBamFlagSummary. 13 qnamePrefixEnd (BamFile). 22. 15 plpYieldSize<.BamFileList-method ScanBamParam. 15–17. 23 razip (Compression). 23 readGAlignmentPairs. 28. 23 scanBcf. 45 plpYieldAll (ApplyPileupsParam). 5 RsamtoolsFileList-class plpYieldBy (ApplyPileupsParam).

7 (RsamtoolsFile). 47 27 show. 11. 54. 18 scanFa.BamFile-method (BamFile). 12 scanFaIndex. 7 yieldSize<-. 2 (TabixFile). 29 show.RangesList-method show.character. 54 yieldTabix (deprecated). 11 yieldSize<-. 52 (RsamtoolsFileList). 43 (FaFile). 11 scanFaIndex (FaInput). 26 . 51 show. sortBam. 31.TabixFile.TabixFile-method (ScanBcfParam-class).character. (ScanBamParam). 36 (TabixFile).FaFile-method (FaFile). TabixFile. (FaInput).RsamtoolsFileList-method scanVcfHeader. 45 seqinfo. 46 ScanVcfParam. 54 (testPairedEndBam). 54 (deprecated).BamFile-method (BamFile). 54 (RsamtoolsFile).BamFileList-method (BamFile).ScanBVcfParam-method scanFa.GRanges-method (FaFile). 56 scanFa. 29 show.missing-method yieldSize.RangesList-method (RsamtoolsFile). 57 (testPairedEndBam).ANY-method TabixInput.PileupParam-method (pileup). 53 yieldSize<. 53 yieldSize.GRanges-method url-class (Rsamtools-package). 45 (FaInput). 46 seqnamesTabix.character.FaFile.BamFile-method (TabixInput). 27 sortBam. 58 scanTabix. 13 scanFaIndex.GRanges-method testPairedEndBam.FaFile-method (FaFile).ScanBamParam-method scanFa.missing-method show.RsamtoolsFile-method seqinfo.BamSampler-method (deprecated).character-method (BamInput).missing-method (FaFile). 54 ScanBcfParam. 56 TabixFileList (TabixFile).RangesList-method sink. 28. 57 TabixFileList-class (TabixFile). 51 (ApplyPileupsParam). 47.TabixFile.TabixFile-method (seqnamesTabix).FaFile. 57 27 TabixFile-class (TabixFile). 26 scanFa (FaInput). 13 scanFaIndex. 29 40 scanFa.ApplyPileupsParam-method (ScanBcfParam-class).character.character.BamFile-method (BamFile). 22. 7 ScanBVcfParam-class show.RangesList-method yieldSize. 54 testPairedEndBam. 57 unz-class (Rsamtools-package). 35. 54 scanTabix. (ScanBcfParam-class).GRanges-method show.BamFileList-method (BamFile). 27 (RsamtoolsFileList). 7 (ScanBcfParam-class). 5 ScanBcfParam-class. 45 scanVcf.(RsamtoolsFile). 54 scanTabix. 58 scanTabix.BamViews-method (BamViews).character-method (TabixFile).RsamtoolsFile-method scanFa. 54 scanTabix (TabixInput). 51 show. 54 scanTabix. 7 29 sortBam.missing-method testPairedEndBam.character-method yieldTabix. 57 (TabixFile).PileupFiles-method (PileupFiles). 51 27 SimpleList.character. 24–26.64 INDEX ScanBcfParam. 26 seqnamesTabix.RangesList-method (TabixInput). 29 show.RsamtoolsFileList-method seqinfo. 45 Seqinfo. 33 (FaInput). 29 sortBam (BamInput). 54.RsamtoolsFile-method (TabixFile).missing-method seqnamesTabix. 10.character.FaFileList-method (FaFile).TabixFile.FaFile.character-method (FaInput). 27 summarizeOverlaps. 54 yieldSize (RsamtoolsFile). 58 scanTabix. 51 (TabixFile). 2 scanTabix. 45 scanTabix.