R is an integrated suite of software facilities for data manipulation, calculati

on and graphical display
A 'vector' is a list of numbers (or character or logical values) (equal to a col
umn in a table)
A 'vector' can only contain elements of the same mode (type)
A 'data frame' is a table where rows are not necessarily related
A 'Matrix' is a table (where rows and columns are related). Also defined as a se
t of vectors of the same length.
R is a functional based language, arguments and options go inside brackets - dat
a and options are separated by commas.
'Function(data, options)'
e.g.
q() = quit
help() = help
help('if')
..
RKWard is a GUI IDE for R
RKWard is not confined to working with just one piece of data at a time, but rat
her you have a "Workspace", where all your different variables, tables, etc. are
accessible.
To create a new spreadsheet (data.frame), choose:
File->New->Dataset
Assign a name to this table
To open pre-existing data, choose:
File->Import->Load data #Loads data in CSV format
..
R from terminal
R
>x <- c(1,2,3) #Function â câ
combines Values into a vector consisting of integers 1 ,2 &
3 & assigns the value to variable 'x'
>x #displays the vector 'x'
[1] 1 2 3
e.g.
>a <>b <>d <>f <-

c(1,2,5.3,6,-2,4) # numeric vector
c("one","two","three") # character vector
c(TRUE,TRUE,TRUE,FALSE,TRUE,FALSE) #logical vector
c( 1, â aâ , T ) # Mixture

>library() #Displays all available libraries. # found in, for e.g. '/usr/lib/R/s
ite-library'
>'<-' the
>5 -> a #
>a = 5 #
>a <- 5 #

assignment operator. Can also be the other way around
assigns scalar 5 to variable 'a'
assigns scalar 5 to variable 'a'
assigns scalar 5 to variable 'a'

>a <- c(1,2,3,4,5) # assigns vector to variable 'a'
>print(a) # prints value of 'a'
>a = c(1:10) #creates a vector with values ranging from 1 to 10

0 [61] 31.7 2.004000000 0.7 0.001754386 0.0 1.6.007142857 0.5 [76] 38.010000000 0.002325581 0.4 2.0000000 0.003846154 0.5 4.001886792 0.5 2.5) #creates a sequence with 40 entries.5 22.5 25.001315789 0.5 10. from=0.0 37.7692308 0.6 3.5 18.2 3.5 > a = > a [1] [19] [37] [55] [73] [91] seq(length=100.005882353 0.5 > a [1] 1.5 14.002000000 0.001111111 0.003703704 0.0 30.002272727 0.001250000 0.001538462 0.002702703 0.5 20.5 0.002564103 0.5 29.001492537 0.5000000 0.5 [16] 8.4 1.001408451 0.0000000 0. 40.6 2.3 3.>a = seq(1.3 2.5 34.002631579 0.001190476 0.001351351 0.4545455 2. with an increment of 0.1 2.0 39.9 [39] 3.0 3.5 33.001052632 0.0 27.5263158 0.002500000 0.003448276 0.8 0.0 31.5 [46] 23.5 27.g.001149425 0.0 7.4761905 110 290 470 650 830 120 300 480 660 840 130 310 490 670 850 140 320 500 680 860 150 330 510 690 870 160 340 520 700 880 170 350 530 710 890 1.002173913 0.004347826 0.0 12.0 36.002777778 0.0 5.001041667 0.0 10.0 22.001428571 0.001298701 0.003225806 0.9090909 0.5) incrementing by 0.003571429 0.0 35.4285714 1.001234568 0.001020408 0.3 1.2500000 1.5 23.0 # creates a sequence with values ranging from 1 to 40.0 32.4347826 .6.7142857 0.020000000 0.5 38. start ing from 4.001136364 0.5 3.5 30.001449275 0.001219512 0.5 1. by=0.7 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2.10) #creates a vector with values ranging from 1 to 10 > a = seq(1.0 [31] 16.5 15.001098901 0.0 26.1111111 [13] 0.006250000 0.0 17.001666667 0. 3.003030303 0.0 24.033333333 0.009090909 0.001785714 0.014285714 0.004166667 0.4 0.7 1.0 2.005263158 0.3 0.006666667 0.001030928 0.001369863 0.5 39.0 9.004761905 0.001923077 0.001818182 0.1 0. by=0) > a = seq(length=40.6666667 [19] 0.1 1.002857143 0.002127660 0.3333333 [7] 1.001086957 0.1 3.001724138 0.002941176 0.0000000 5.002439024 0.003125000 0.0 15.0 16.5000000 1.2 1.0 >a = seq(length=40.0 21.5882353 0.5 2.5 6.002222222 0. by=0.0 28.9 1.5 8. > b = seq(length=40.9 2.0 4.002380952 0.5 35.1) > a [1] 0.001639344 0.100000000 0.002040816 0.0 18.008333333 0.0000000 3.5 40.0 34.5 26.050000000 0.001388889 0.025000000 0.001515152 0.0 25.0 [20] 1.5 21.0 11.016666667 0.5 5.0 20.001123596 0.001282051 0.003333333 0.001587302 0.8333333 0.002083333 0.0 14.5 12.5 7. from=1.9 > b [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [39] 1 1 > b/a [1] Inf 10.001204819 0.001851852 0.5 9.0 6.001162791 0.005000000 0.001562500 0.0 0.8 3. from=0.5 11.2 2.0 13.8 3.6666667 1. from=4.001612903 0.5 32.001075269 0.001960784 0.0 29.6 0.011111111 0.004545455 0.001470588 0.5 37.5 36.001063830 0.0 33.0 3.5555556 0.001176471 0.001010101 0 180 360 540 720 900 10 190 370 550 730 910 20 200 380 560 740 920 30 210 390 570 750 930 40 220 400 580 760 940 50 230 410 590 770 950 60 240 420 600 780 960 70 250 430 610 790 970 80 260 440 620 800 980 90 270 450 630 810 990 100 280 460 640 820 #sequences can be divided by sequences e. by=0.6 1.4 3.012500000 0.5 24. by=10) > 1/a [1] [7] [13] [19] [25] [31] [37] [43] [49] [55] [61] [67] [73] [79] [85] [91] [97] #Divides 1 by every element for every element of the sequence 'a' Inf 0.5 19.8 2.6250000 0.5 13.5 17.001694915 0.007692308 0.5 28.001265823 0.2 0.0 2.001333333 0.0 19.005555556 0.

9 3.9 > sort(a) #sorts sequence [1] 0. as long as any of the vectors is a multiple of the other.5 2.2.3225806 0.6324555 0.9 1.3038405 1.2702703 0.7 0.8 2.8 0.3 3.3703704 0.3784049 1. doesn't work otherwise .0 2. > a+c [1] 2 4 4 Warning message: In a + c : longer object length is not a multiple of shorter object length Logical values TRUE & FALSE > TRUE [1] TRUE .9486833 [15] 1.3333333 0.6 3.6431677 1.0 0.2941176 0.3846154 0.c(1.5 0.0954451 1.1 2.8439089 > length(a) #length of a [1] 40 > min(a) # minimum val of 'a' [1] 0 >max(a) # maximum val of 'a' [1] 3.c(1.2631579 0.2247449 1.5 1.9493589 1.3125000 0.95 >var(a) #computes variance of 'a' [1] 1.8366600 0.2649111 [22] 1.5 3.1 0.3 2.8165902 1.2564103 >a*8 # multiplies every element of the sequence by 8 >a+a #adds each sequence elements of 'a' to itself >mean(a) #computes mean of 'a' [1] 1.4491377 1.3571429 0.2 2.7029386 1.4 2.7 2.7888544 1.5477226 0.0488088 1.7 [39] 3.0 3.9748418 0.6 1.3030303 0.1 3.7071068 1.5491933 1.3 0.2857143 [37] 0.4 3.4 1.1 1.[25] 0.7606817 1.9 #Adding vectors of different lengths > a <.8708287 1.4166667 0.2 3.3416408 1.6733201 1.6 0.4 0.1401754 1.8 [20] 1.3162278 0.8973666 1.3448276 [31] 0.2) > a+b [1] 5 6 7 # adds the only element of vector 'b' to every element of vector 'a'.9 2.2 0.2777778 0.6124515 1.0 1.7 1.4472136 [8] 0.7745967 1.6 2.3 1.7320508 [36] 1.0000000 0.0000000 1.4142136 1.8944272 0.4000000 0.3) > b <.2 1.5811388 1.4832397 1.5165751 [29] 1.c(4) > c <.366667 > sum(a) #computes sum of all elements of 'a' [1] 78 >prod(a) #computes product of all elements of 'a' [1] 0 > sqrt(a) #computes square root of [1] 0.1832160 1.8 3.9235384 all members of sequence a 0.

.g. : 5.0 8 4 B 6.:10.25 Median :11. :11. Logic test c an also be assigned to a var.25 Median : 8.sep="...25 3rd Qu.csv"...> FALSE [1] FALSE > (a<1) [1] FALSE FALSE FALSE #displays logic for each element of vector a...00 velocity Min....cyclismo.d[1:3] #assigns elements of vector 'd'....00 B:3 1st Bu. : 8.75 Max.") > heisenberg trial mass velocity 1 A 10.: 6..50 Mean : 8.:10..00 1st Qu.5 13 6 B 7. You can now access each individual column using a "$" to separate the two names : If you are not sure what columns are contained in the variable you can use the n ...50 Mean :11. >logic <.0 12 2 A 11.read.0 10 5 A 10.(a<1) Relational operators '== != < > <= >= ' #Used in logic tests Boolean operators: & = logical AND | = logical OR ! = logical NOT > a [1] 1 2 3 > a[a>1] #displays elements of vector 'a' whose values are >1 [1] 2 3 > d [1] 2 4 4 > b <.38 Max. :14..... from position 1 to position 3 to variable 'b' > b [1] 2 4 4 Bioconductor: collection of packages for the analysis and comprehension of genom ic data. .00 Each column is assigned a name based on the header (the first line in the file). R tutorial at http://www..org/tutorial/R/ Reading a CSV file >heisenberg <....0 11 > summary(heisenberg) trial mass A:3 Min.csv(file="simple.33 3rd Qu.head=TRUE.:12. has hundreds of packages to deal with microarray analysis and NGS seque ncing data.0 14 3 B 5. e.

sep=". 'tree' is of type data. If you are not sure what kind of variable you have then you can use the attribut es command.ames command: > names(heisenberg) [1] "trial" "mass" "velocity" > heisenberg$mass [1] 10.0 10. It is common to come across data that is organized in flat files and delimited a t preset locations on each line. This is often called a "fixed width file.0 assigning a csv file to a variable called 'tree' > tree <.c('Hello'.csv(file="trees91.c('Hello there') > b [1] "Hello there" > b[1] [1] "Hello there" > b <.csv".0 11.header=TRUE.frame.").'there') > b[2] [1] "there" > b[0] character(0) Factors in R Often. This will list all of the things that R uses to describe the variabl e: > attributes(tree) $names [1] "C" "N" [9] "STNCC" "RTNCC" [17] "LFKCC" "STKCC" [25] "RTPCC" "LFSCC" "CHBR" "LFBCC" "RTKCC" "STSCC" "REP" "LFBM" "STBM" "RTBM" "LFNCC" "STBCC" "RTBCC" "LFCACC" "STCACC" "RTCACC" "LFMGCC" "STMGCC" "RTMGCC" "LFPCC" "STPCC" "RTSCC" $class [1] "data.c(1.5) > a[0] numeric(0) #Note: the zero entry indicates how data is stored.2.5 7. when looking at the impact of CO2 on the growth rate of a tree you might tr .0 6. > b <.g.names [1] "1" "2" [16] "16" "17" [31] "31" "32" [46] "46" "47" "3" "18" "33" "48" "4" "19" "34" "49" "5" "20" "35" "50" "6" "21" "36" "51" "7" "22" "37" "52" "8" "23" "38" "53" "9" "10" "11" "12" "13" "14" "15" "24" "25" "26" "27" "28" "29" "30" "39" "40" "41" "42" "43" "44" "45" "54" The first thing that R stores is a list of names which refer to each column of t he data. an experiment includes trials for different levels of some variable. For e.0 5.read.frame" $row. for info."The c ommand to deal with these kind of files is read.fwf.4. use 'help(read.fw f)' > ls() #used to get a list of the variables defined in a session [1] "a" "b" > a <.3.

several of the columns are factors.000 2. 2. > summary(tree$CHBR) A1 A2 A3 A4 A5 A6 A7 B1 B2 B3 B4 B5 B6 B7 C1 C2 C3 C4 C5 C6 3 1 1 3 1 3 1 1 3 3 3 3 3 3 1 3 1 3 1 1 C7 CL6 CL7 D1 D2 D3 D4 D5 D6 D7 1 1 1 1 1 3 1 1 1 1 In this data set. the first column. . labeled "C. The researchers quite sensibly labeled th ese four environments as 1.519 3. Unfortunately.000 > tree$C <. 3. A set of factors have a decrete set of possible values. You specify that a variable is a factor using the factor comm and.y to observe how different trees respond when exposed to different preset concen trations of CO2. Max. Each trees was grown in an environment with one of four differ ent possible levels of carbon dioxide. Median Mean 3rd Qu. but the researchers used n umbers to indicate the different levels.000 2. In the following example we convert tree$C into a factor: > tree$C [1] 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 [39] 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 > summary(tree$C) Min. This is a common problem and there is a way to tell R to treat the "C" column as a set of factors. R cannot determine that these are factors and must assume that they are regular numbers.000 2. One thing that is important is the number of times that ea ch factor appears." is a factor.000 4. and it does not make sense to try to find averages or other nu merical descriptions. called their "frequencies. The different levels are also called factors. For example. 1st Qu. 1.factor(tree$C) > tree$C [1] 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 [39] 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 Levels: 1 2 3 4 > summary(tree$C) 1 2 3 4 8 23 10 13 > levels(tree$C) [1] "1" "2" "3" "4" Once a vector is converted into a set of factors then R treats it in a different manner then when it is a set of numbers." which is printed using the summar y command. and 4.