You are on page 1of 36

Lecture  4    

Xiaotong  Suo  

Homework  1  
•  Ques7on  3  

Today’s  agenda  
•  Data  input/output  
•  Graphics  

    .     •  If  you  have  a  tab-­‐delimited  file.table’.Data  input/output   •  R  can  write  matrix  and  data  frames  to  file  using  the   func7on  ’write.csv’.  Mathema7cs  I     The  above  is  an  example  of  a  comma-­‐separated  file.Sta7s7cs      2009.  John  Doe.   then  use  ’read.table’.      Year.   Tab.delim’  instead.delimited  is  the  same  except  that  we  have  tabs  as  a   separator.  use  the  func7on   ’read.  And  read  data  from  file  using   ’read.Student.  If  the  file  is  comma-­‐separated  file.Major    2009.  Bart  Simpson.

    .Data  input/output  con7nued   •  The  data  set  ’airquality’  is  available  is  R  and   gives  weather  measurement  in  New  York  city   over  some  period  of  7me.  Load  that  data  set   in  a  data  frame  and  save  it  to  a  file.

dt’.  na=’Missing’)   You  could  also  use  ’write.’Airquality.csv’.col.  See  the  help   documenta7on  for  details.Data  input/output  con7nued   –  dt=airquality     –  write.   row.names=T.sep="  ".     .names=F.table(dt.

Data  Input/output  con7nued   •  Things  to  keep  in  mind  when  reading  or  wri7ng   to  file:     •  Header:  whether  the  file  has  a  first  row  giving  the   names  of  the  variables.  tabular.     •  Separator:  What  separator  of  fields  is  used:   space.     •  Missing  data  character  string:  What  character   strings  serve  as  missing  data.     .is.     •  Do  you  want  to  allow  R  to  convert  characters   variables  to  factors?  use  op7ons  stringsAsFactors   and  as.  comma.

 dec=’.dat’.table(’  col.table:   –  mydata=read.’V2’).header=F.Data  input/output  con7nued   •  The  general  syntax  of  read.  sep=’   ’.strings=’NA’)     .’.names=c(’V1’.

na.   –  dtNew=read.dt’.’.Data  input/output  con7nued   •  Let  try  it  with  the  file  just  saved.table(’Airquality.strings=’Missing’)     .  sep=’   ’.  dec=’.header=T.

    .  if  you  have  a  tab-­‐delimited   file.  If  the   file  is  comma-­‐separated  file.fwf   that  works  with  fixed-­‐width  text  data.Data  input/output  con7nued   •  As  men7oned  earlier.     •  Yet.  See  the  user  manual.     •  Another  func7on  to  read  text  data  is  read.  use  the  func7on  ’read.delim’  instead.  See  the   user  manual  for  more  detail.  It  is  more  efficient  when  reading  data  of  a   single  mode.  another  func7on  to  read  data  from  file  is   ’scan’.  then  use  ’read.csv’.

 Load  this   file  in  R.     .Data  input/output   •  Exercise:  The  file  ’Earmarksbymember08.xls’  is   an  Excel  file  available  in  coursework.

 Your  graphs  are  then   sent  to  that  file.Graphics   •  R  has  a  powerful  graphical  capability.  use     –  windows()     •  A  graphical  device  can  also  be  a  file.  If  you  launch   your  plot  right  away.  R  will  create  automa7cally  one   graphical  device  for  you.   •  On  OS  Mac  use  the  func7on     –  quartz()     to  create  a  graphical  device.   To  plot  a  graph  you  need  a  graphical  device.     •  On  Windows  systems.  Use  the  func7ons     –  pdf()     –  postscript()     .

dt$Wind.xlab=’Temperature’.  ylab=’Wind’.     –  dt=airquality     –  names(dt)     –  boxplot(dt$Temp)     –  plot(dt$Temp.  73’)     .  in  NY  city  May-­‐Sept.   main=’Wind  vs  Temp.type=’p’.dt $Wind.Graphics  con7nued   •  Example:  the  airquality  data  set.type=’p’)     –  plot(dt$Temp.type=’l’)   –  plot(dt$Temp.

factor(dt$Month)     –  boxplot(Temp  ~  Month.’Sept.Graphics   •  Con7nuing  with  the  airquality  dataset.’July’.data=dt.     –  dt$Month=as.   suppose  we  want  to  do  a  boxplot  of  the  data   from  each  month.’August’.’))     .’June’.   names=c(’May’.

    •  One  simple  possibility  is  ’layout’.Graphics   •  What  if  we  want  to  have  mul7ple  graphics  on   the  same  graphical  device?  There  are  many   ways  to  do  this.     .

Graphics   •  Example:  the  airquality  data  set.ncol=2)     –  layout(m)   –  layout.type=’l’.main=’Boxplot’)     –  plot(dt$Temp.main=’Time  series  plot’)     .     –  m=matrix(c(1.2).show(2)     –  boxplot(dt$Temp.

main=’Boxplot     –  boxplot(dt$Temp.2)     –  layout(m)   –  layout.dt$Wind.type=’p’.   ylab=’Wind’.  in  NY   city’)     –  plot(dt$Temp.3.type=’l’.     –  m=matrix(c(1.3).main=’xyplot’)     .xlab=’Temp’.2.Graphics   •  Example:  the  airquality  data  set.main=’Temp.  in  NY  city’)   –  plot(dt$Temp.

Graphics  con7nued   •  What  if  we  want  to  put  mul7ple  graphs  on  the   same  plot.     •  issue     –  par(new=T)      first.   .

y):  bivariate  plot  of  y  as  func7on  of  x..start().Graphics  con7nued   •  Few  plokng  func7ons  in  R:   –  plot(x):  plot  the  values  of  vector  x.   plot(x.     .   boxplot(x):  ”box-­‐and-­‐whiskers”  plot..    .   hist(x):  produce  a  histogram  of  x.  many  others.  See  R  manual  by  typing   help.

Graphics  con7nued   •  Example:     –  n=10000.prob=T.xlim=c(-­‐4.     –  hist(X.0.4).ylim=c(0.lwd=2.   –  X=rnorm(n).xlab=’’.col= ’red’  .col=’blue’.4).breaks=200.ylim=c(0.   xlim=c(-­‐4.4))   –  par(new=T)   –  curve(dnorm.ylab=’’)     .0.4).

xlab=’100  Normal  rvs’.graphics   Example:     –  X=rnorm(100).   –  Y=rnorm(100)     –  m=matrix(c(1.  col=’blue’.ncol=2)   –  layout(m)     –  plot(x.2).ylab=’100  Normal   rvs’.main=’Example  of  plot  in  R’)     .y.y)     –  plot(x.pch=4.

    .Graphics  con7nued   •  Exercise:  The  Californian  freeway   performance  measurement  system.  The  data   is  ’flow-­‐occ-­‐table.   Download  the  file  to  your  computer  and  load   it  in  R  using  read.txt’  in  coursework.table.  Prac7ce  with  the   following  code.

txt’.cases(dt)     –  sum(Ind).’)     –  names(dt)     –  Ind=complete.sep=’.1])     –  arach(dt)     .Graphics  con7nued   –  dt=read.table(’flow-­‐occ-­‐ table.header=T.   –  length(dt[.

 for  Lane  3’)     .’Flow3’)  main=’Boxplots  flows’)   boxplot(Occ1.5.2.  for  Lane  2’)   –  plot(Occ3.4.Flow3.names=c(’Flow1’.5.col=’blue’.Occ2.  main=’Flow  vs   Occup.type=’p’.’)     –  plot(Occ2.5).Occ3.Flow3.3.  main=’Boxplots  Occup.col=’red’.Flow2.Flow2. ’Flow3’).type=’p’.’Flow2’.5.ncol=4)     –  layout(m)   –  boxplot(Flow1.’Flo w2’.  main=’Flow   vs  Occup.Graphics  con7nued   –  m=matrix(c(1.names=c(’Flow1’.

type=’l’.5).xlim=c(0.2   and  3’)     .5).1700).0.Graphics  con7nued   –  plot(Occ1.   ylim=c(0.   ylim=c(0.xlim=c(0.xlim=c(0.   ylim=c(0.0.5).0.type=’l’.1700).  for  Lane  1.main=’Occup.type=’l’.1700).col=’blue’)     –  par(new=T)     –  plot(Occ3.col=’red’.col=’green’)     –  par(new=T)     –  plot(Occ2.

’blue’.  ’Lane  2’.’red’)  .  ’Lane   3’).1.Graphics   –  legend(x=’top’.col=c(’green’.lty=c(1.1))     .legend=c(’Lane  1’.

org/web/packages/ ggplot2/index.r-­‐project.html   •  Returns  much  nicer  plots.   •  Install  the  package  first  in  R  and  type   library(ggplot2)     .ggplot2   •  hrp://cran.

  input/output.  we  learn  the   more  general  task  of  wri7ng  computer   programs  using  R.  Here.Control  structures   •  So  far  we  have  learned  some  of  the  basic   aspects  of  R:  working  with  its  basic  objects.     .  graphics.

    •  ︎R  programming  language  has  control   structures  similar  to  C     .Control  structures  con7nued   •  An  important  component  of  a  programming   language  is  control  structures  to  implement   repe77ve  tasks.

 For  instance.   suppose  we  want  to  calculate:       10 ∑i i=1 .For  loops   •  Loops  are  used  to  carry  out  a  sequence  of   related  opera7ons  without  having  to  write  the   code  for  each  step  explicitly.

For  loops  con7nued   –  x=0   for  (i  in  1:10)  {        x=x+i      }     .

  meaning  that  its  value  is  repeatedly  updated  while  the   program  runs.     •  To  clarify.       –  x=0   –  for  (i  in  1:10)  {      x=x+i      print(c(i.For  loops   •  In  the  above  program.x))              }     .     •  ︎  Always  remember  to  ini7alize  accumulator  variables   (to  zero  in  the  example).  we  can  add  a  print  statement  inside  the  loop   body.  x  is  an  accumulator  variable.

For  loops   •  The  general  structure  of  ’for’  loops:     –  for  (var  in  seq)  expr     Or   –  for  (var  in  seq){      expr     }     .

 write  a  for  loop   that  calculates  the  sum  of  each  row  of  A.     .For  loops  con7nued   •  Exercise:  Given  a  matrix  A.

    •  There  is  never  the  need  to  do  such  loops  in  R   because  it  provides  a  simple  class  of  func7ons   to  do  just  that:  the  ”apply”  func7ons.     •  ︎Owen  7mes  the  apply  func7ons  even  lead  to   faster  code  (but  not  always).     .For  loops  con7nued   •  This  is  an  example  of  a  ”trivial”  for  loop.

etc)   .Next  lecture   •  More  control  structures   •  R  in  Sta7s7cs(linear  regression.