You are on page 1of 39

Read and Write CSV Files in R

One of the easiest and most reliable ways of getting data into R is to use CSV files.
The CSV file (Comma Separated Values file) is a widely supported file format
used to store tabular data. It uses commas to separate the different values in a
line, where each line is a row of data.
R’s Built-in csv parser makes it easy to read, write, and process data from CSV
files.

Read a CSV File


Suppose you have the following CSV file.
mydata.csv
name,age,job,city
Bob,25,Manager,Seattle
Sam,30,Developer,New York
You can open a file and read its contents by using the read.csv() function
specifying its name. It reads the data into a data frame.

# Read entire CSV file into a data frame


mydata <- read.csv("data.csv")
mydata
name age job city
1 Bob 25 Manager Seattle
2 Sam 30 Developer New York

Specify a File
When you specify the filename only, it is assumed that the file is located in the
current folder. If it is somewhere else, you can specify the exact path that the file is
located at.
Remember! while specifying the exact path, characters prefaced by \ (like \n \r \t
etc.) are interpreted as special characters.
You can escape them by:
 Changing the backslashes to forward slashes like: "C:/data/myfile.csv"
 Using the double backslashes like: "C:\\data\\myfile.csv"d

# Specify absolute path like this


mydata <- read.csv("C:\Users\MANI\OneDrive\Desktop\R")

# or like this
mydata <- read.csv("C:\\data\\mydata.csv")
If you want to read CSV Data from the Web, substitute a URL for a file name.
The read.csv() functions will read directly from the remote server.

# Read CSV file from Web


mydata <- read.csv("http://www.example.com/download/mydata.csv")
R can also read data from FTP servers, not just HTTP servers.

# Read CSV file from FTP server


mydata <- read.csv("ftp://ftp.example.com/download/mydata.csv")

Set Column Names


The read.csv() function assumes that the first line of your file is a header line. It
takes the column names from the header line for the data frame.

# By default first row is used to name columns


mydata <- read.csv("mydata.csv")
mydata
name age job city
1 Bob 25 Manager Seattle
2 Sam 30 Developer New Yorkm
If your file does not contain a header like the file below, then you should
specify header=FALSE so that R can create column names for you (V1, V2, V3
and V4 in this case)
mydata.csv
Bob,25,Manager,Seattle
Sam,30,Developer,New York

# If your file doesn't contain a header, set header to FALSE


mydata <- read.csv("mydata.csv",
header = FALSE)
mydata
V1 V2 V3 V4
1 Bob 25 Manager Seattle
2 Sam 30 Developer New York
However, if you want to manually set the column names, you
specify col.names argument.

# Manually set the column names


mydata <- read.csv("mydata.csv",
header = FALSE,
col.names = c("name", "age", "job", "city"))
mydata
name age job city
1 Bob 25 Manager Seattle
2 Sam 30 Developer New York

Import the Data as is


The read.csv() function automatically coerces non-numeric data into
a factor (categorical variable). You can see that by inspecting the structure of your
data frame.

# By default, non-numeric data is coerced into a factor


mydata <- read.csv("data.csv")
str(mydata)
'data.frame': 2 obs. of 4 variables:
$ name: Factor w/ 2 levels "Bob","Sam": 1 2
$ age : int 25 30
$ job : Factor w/ 2 levels "Developer","Manager": 2 1
$ city: Factor w/ 2 levels "New York","Seattle": 2 1
If you want your data interpreted as string rather than a factor, set
the as.is parameter to TRUE.

# Set as.is parameter to TRUE to interpret the data as is


mydata <- read.csv("mydata.csv",
as.is = TRUE)
str(mydata)
'data.frame': 2 obs. of 4 variables:
$ name: chr "Bob" "Sam"
$ age : int 25 30
$ job : chr "Manager" "Developer"
$ city: chr "Seattle" "New York"

Set the Classes of the Columns


You can manually set the classes of the columns using the colClasses argument.

mydata <- read.csv("mydata.csv",


colClasses = c("character", "integer", "factor", "character"))
str(mydata)
'data.frame': 2 obs. of 4 variables:
$ name: chr "Bob" "Sam"
$ age : int 25 30
$ job : Factor w/ 2 levels "Developer","Manager": 2 1
$ city: chr "Seattle" "New York"
Limit the Number of Rows Read
If you want to limit the number of rows to read in, specify nrows argument.

# Read only one record from CSV


mydata <- read.csv("mydata.csv",
nrows = 1)
mydata
name age job city
1 Bob 25 Manager Seattle

Handle Comma Within a Data


Sometimes your CSV file contains fields such as an address that contains a
comma. This can become a problem when working with a CSV file.
To handle comma within a data, wrap it in quotes. R considers a comma in a
quoted string as an ordinary character.
You can specify the character to be used for quoting using the quote argument.
mydata.csv
name,age,address
Bob,25,"113 Cherry St, Seattle, WA 98104, USA"
Sam,30,"150 Greene St, New York, NY 10012, USA"

mydata <- read.csv("mydata.csv"


quote = '"')
mydata
name age address
1 Bob 25 113 Cherry St, Seattle, WA 98104, USA
2 Sam 30 150 Greene St, New York, NY 10012, USA
Write a CSV File
To write to an existing file, use write.csv() method and pass the data in the form
of matrix or data frame.

# Write a CSV File from a data frame


df
name age job city
1 Bob 25 Manager Seattle
2 Sam 30 Developer New York

write.csv(df, "mydata.csv")
mydata.csv
"","name","age","job","city"
"1","Bob","25","Manager","Seattle"
"2","Sam","30","Developer","New York"
Notice that the write.csv() function prepends each row with a row name by default.
If you don’t want row labels in your CSV file, set row.names to FALSE.

# Remove row labels while writing a CSV File


write.csv(df, "mydata.csv",
row.names = FALSE)
mydata.csv
"name","age","job","city"
"Bob","25","Manager","Seattle"
"Sam","30","Developer","New York"
Notice that all the values are surrounded by double quotes by default. Set quote =
FALSE to change that.

# Write a CSV file without quotes


write.csv(df, "mydata.csv",
row.names = FALSE,
quote = FALSE)
mydata.csv
name,age,job,city
Bob,25,Manager,Seattle
Sam,30,Developer,New York

Append Data to a CSV File


By default, the write.csv() function overwrites entire file content. To append the
data to a CSV File, use the write.table() method instead and set append = TRUE.

df
name age job city
1 Amy 20 Developer Houston

write.table(df, "mydata.csv",
append = TRUE,
sep = ",",
col.names = FALSE,
row.names = FALSE,
quote = FALSE)
mydata.csv
name,age,job,city
Bob,25,Manager,Seattle
Sam,30,Developer,New York
Amy,20,Developer,Houston

Read and Write Excel Files in R


Excel is the most popular spreadsheet software used to store tabular data. So,
it’s important to be able to efficiently import and export data from these files.

R’s xlsx package makes it easy to read, write, and format excel files.


The xlsx Package
The xlsx package provides necessary tools to interact with
both .xls or .xlsx format files from R.

In order to get started you first need to install and load the package.

# Install and load xlsx package


install.packages("xlsx")
library("xlsx")

Read an Excel file


Suppose you have the following Excel file.

You can read the contents of an Excel worksheet using


the read.xlsx() or read.xlsx2() function.

The read.xlsx() function reads the data and creates a data frame.

# Read the first excel worksheet


library(xlsx)
mydata <- read.xlsx("mydata.xlsx", sheetIndex=1)
mydata
name age job city
1 Bob 25 Manager Seattle
2 Sam 30 Developer New York
3 Amy 20 Developer Houston
read.xlsx() vs read.xlsx2()
Both the functions work exactly the same except, read.xlsx() is slow for large data
sets (worksheet with more than 100 000 cells).
On the contrary, read.xlsx2() is faster on big files.

Specify a File Name


When you specify the filename only, it is assumed that the file is located in the
current folder. If it is somewhere else, you can specify the exact path that the file
is located at.

Remember! While specifying the exact path, characters prefaced by \ (like \n \r \t


etc.) are interpreted as special characters.

You can escape them using:

 Changing the backslashes to forward slashes like: "C:/data/myfile.xlsx"

 Using the double backslashes like: "C:\\data\\myfile.xlsx"

# Specify absolute path like this


mydata <- read.csv("C:/data/mydata.xlsx")

# or like this
mydata <- read.csv("C:\\data\\mydata.xlsx")

Specify Worksheet
When you use read.xlsx() function, along with a filename you also need to specify
the worksheet that you want to import data from.

To specify the worksheet, you can pass either an integer indicating the position
of the worksheet (for example, sheetIndex=1) or the name of the worksheet (for
example, sheetName="Sheet1" )
The following two lines do exactly the same thing; they both import the data in
the first worksheet (called Sheet1):

mydata <- read.xlsx("mydata.xlsx", sheetIndex = 1)

mydata <- read.xlsx("mydata.xlsx", sheetIndex = "Sheet1")

Import the Data as is


The read.xlsx() function automatically coerces character data into
a factor (categorical variable). You can see that by inspecting the structure of
your data frame.

# By default, character data is coerced into a factor


mydata <- read.xlsx("mydata.xlsx", sheetIndex = 1)
str(mydata)
'data.frame': 3 obs. of 4 variables:
$ name: Factor w/ 3 levels "Amy","Bob","Sam": 2 3 1
$ age : num 25 30 20
$ job : Factor w/ 2 levels "Developer","Manager": 2 1 1
$ city: Factor w/ 3 levels "Houston","New York",..: 3 2 1

If you want your data interpreted as string rather than a factor, set
the stringsAsFactors parameter to FALSE.

# Set stringsAsFactors parameter to TRUE to interpret the data as is


mydata <- read.xlsx("mydata.xlsx",
sheetIndex = 1,
stringsAsFactors = FALSE)
str(mydata)
'data.frame': 3 obs. of 4 variables:
$ name: chr "Bob" "Sam" "Amy"
$ age : num 25 30 20
$ job : chr "Manager" "Developer" "Developer"
$ city: chr "Seattle" "New York" "Houston"

Read Specific Range


If you want to read a range of rows, specify the rowIndex argument.

# Read first three lines of a file


mydata <- read.xlsx("mydata.xlsx",
sheetIndex = 1,
rowIndex = 1:3)
mydata
name age job city
1 Bob 25 Manager Seattle
2 Sam 30 Developer New York

If you want to read a range of columns, specify the colIndex argument.

# Read first two columns of a file


mydata <- read.xlsx("mydata.xlsx",
sheetIndex = 1,
colIndex = 1:2)
mydata
name age
1 Bob 25
2 Sam 30
3 Amy 20

Specify Starting Row


Sometimes the excel file (like the file below) may contain notes, comments,
headers, etc. at the beginning which you may not want to include.
To start reading data from a specified row in the excel worksheet,
pass startRow argument.

# Read excel file from third row


mydata <- read.xlsx("mydata.xlsx",
sheetIndex = 1,
startRow = 3)
mydata
name age job city
1 Bob 25 Manager Seattle
2 Sam 30 Developer New York
3 Amy 20 Developer Houston

Write Data to an Excel File


To write to an existing file, use write.xlsx() method and pass the data in the form of
matrix or data frame.

# Export data from R to an excel workbook


df
name age job city
1 Bob 25 Manager Seattle
2 Sam 30 Developer New York
3 Amy 20 Developer Houston
write.xlsx(df, file = "mydata.xlsx")

Notice that the write.xlsx() function prepends each row with a row name by default.
If you don’t want row labels in your excel file, set row.names to FALSE.

# Remove row labels while writing an excel File


write.xlsx(df, file="mydata.xlsx",
row.names = FALSE)

To set the name of the current worksheet, specify sheetName argument.

# Rename current worksheet


write.xlsx(df, file="mydata.xlsx",
row.names = FALSE,
sheetName = "Records")
Add Multiple Datasets at once
To add multiple data sets in the same Excel workbook, you have to set
the append argument to TRUE.

# Write the first data set


write.xlsx(iris, file = "mydata.xlsx",
sheetName = "IRIS", append = FALSE)

# Add a second data set


write.xlsx(mtcars, file = "mydata.xlsx",
sheetName = "CARS", append = TRUE)

# Add a third data set


write.xlsx(Titanic, file = "mydata.xlsx",
sheetName = "TITANIC", append = TRUE)

Create and Format an Excel Workbook


Sometimes you may wish to create a .xlsx file with some formatting. With the
help of xlsx package, you can edit titles, borders, column width, format data
table, add plot and much more.

The following example shows how to do so:


Step 1. Create a new excel workbook
You can create a new workbook using the createWorkbook() function.

# create new workbook


wb <- createWorkbook()

Step 2. Define cell styles for formatting the workbook


In R, using the CellStyle() function you can create your own cell styles to change
the appearance of, for example:

 The sheet title

 The row and column names

 Text alignment for the columns

 Cell borders around the columns

# define style for title


title_style <- CellStyle(wb) +
Font(wb, heightInPoints = 16,
isBold = TRUE)

# define style for row and column names


rowname_style <- CellStyle(wb) +
Font(wb, isBold = TRUE)
colname_style <- CellStyle(wb) +
Font(wb, isBold = TRUE) +
Alignment(wrapText = TRUE, horizontal = "ALIGN_CENTER") +
Border(color = "black",
position =c("TOP", "BOTTOM"),
pen =c("BORDER_THIN", "BORDER_THIN"))
Step 3. Create worksheet and add title
Before you add data, you have to create an empty worksheet in the workbook.
You can do this by using the creatSheet() function.

# create a worksheet named 'Data'


ws <- createSheet(wb, sheetName = "Data")

Step 4. Add sheet title


Here’s how you can add a title.

# create a new row


rows <- createRow(ws, rowIndex = 1)

# create a cell in the row to contain the title.


sheetTitle <- createCell(rows, colIndex = 1)

# set the cell value


setCellValue(sheetTitle[[1,1]], "Vapor Pressure of Mercury")

# set the cell style


setCellStyle(sheetTitle[[1,1]], title_style)

Step 5. Add a table into a worksheet


With the addDataframe() function, you can add the data table in the newly created
worksheet.

Below example adds built-in pressure dataset on row #3.

# add data table to worksheet


addDataFrame(pressure, sheet = ws, startRow = 3, startColumn = 1,
colnamesStyle = colname_style,
rownamesStyle = rowname_style,
row.names = FALSE)

Step 6. Add a plot into a worksheet


You can add a plot in the worksheet using the addPicture() function.

# create a png plot


png("plot.png", height=900, width=1600, res=250, pointsize=8)
plot(pressure, xlab = "Temperature (deg C)",
ylab = "Pressure (mm of Hg)",
main = "pressure data: Vapor Pressure of Mercury",
col="red", pch=19, type="b")
dev.off()

# Create a new sheet to contain the plot


sheet <-createSheet(wb, sheetName = "plot")

# Add the plot created previously


addPicture("plot.png", sheet, scale = 1, startRow = 2,
startColumn = 1)

# Remove the plot from the disk


res<-file.remove("plot.png")

Step 7. Change column width


Now change the column width to fit the contents.

# change column width of first 2 columns


setColumnWidth(sheet = ws, colIndex = 1:2, colWidth = 15)

Step 8. Save the workbook


Finally, save the workbook with the saveWorkbook() function.
# save workbook
saveWorkbook(wb, file = "mydata.xlsx")

Step 9. View the result


R plot() Function
An effective and accurate data visualization is an important part of a statistical
analysis. It can make your data come to life and convey your message in a
powerful way.

R has very strong graphics capabilities that can help you visualize your data.

The plot() function


In R, the base graphics function to create a plot is the plot() function. It has many
options and arguments to control many things, such as the plot type, labels, titles
and colors.

Syntax
The syntax for the plot() function is:
plot(x,y,type,main,xlab,ylab,pch,col,las,bty,bg,cex,…)

Parameters

Parameter Description

x The coordinates of points in the plot

y The y coordinates of points in the plot

type The type of plot to be drawn

main An overall title for the plot

xlab The label for the x axis

ylab The label for the y axis


pch The shape of points

col The foreground color of symbols as well as lines

las The axes label style

bty The type of box round the plot area

bg The background color of symbols (only 21 through 25)

cex The amount of scaling plotting text and symbols

… Other graphical parameters

Create a Simple Plot


To get started with plot, you need a set of data to work with.
Let’s consider the built-in pressure dataset as an example dataset. It contains
observations of the vapor pressure of mercury over a range of temperatures.

# First six observations of the 'Pressure' dataset


head(pressure)
temperature pressure
1 0 0.0002
2 20 0.0012
3 40 0.0060
4 60 0.0300
5 80 0.0900
6 100 0.2700

To create a plot just specify the dataset in plot() function.

# Plot the 'pressure' dataset


plot(pressure)

Change the Shape and Size of the Points


You can use the pch (plotting character) argument to specify symbols to use
when plotting points.

Here’s a list of symbols you can use.


# Change the shape of the points
plot(pressure, pch=17)

For symbols 21 through 25, you can specify border color using col argument and
fill color using bg argument.

# Change the border color to blue and background color to lightblue


plot(pressure, pch=21, col="blue", bg="lightblue")
To alter the size of the plotted characters, use cex (character expansion)
argument.

# Scale the data points by 1.2


plot(pressure, cex=1.2)

Changing the Color


You can change the foreground color of symbols using the argument col.

# Change the color of symbols to red


plot(pressure, col="red")
R has a number of predefined colors that you can use in graphics. Use
the colors() function to get a complete list of available names for colors.

# List of predefined colors in R


colors()
[1] "white" "aliceblue" "antiquewhite"
[4] "antiquewhite1" "antiquewhite2" "antiquewhite3"
...

Or you can refer the following color chart.


You can specify colors by index, name, hexadecimal, or RGB value. For
example col=1, col="white", and col="#FFFFFF" are equivalent.

Different Plot Types


You can change the type of plot that gets drawn by using the type argument.

Here’s a list of all the different types that you can use.

Value Description

“p” Points
“l” Lines

“b” Both points and lines

“c” The lines part alone of “b”

“o” Both points and lines “overplotted”

“h” Histogram like (or high‐density) vertical lines

“s” Step plot (horizontal first)

“S” Step plot (vertical first)

“n” No plotting
For example, to create a plot with lines between data points, use type="l"; to draw
both lines and points, use type="b".

A series of graphics showing different types is shown below.

Adding Titles and Axis Labels


You can add your own title and axis labels easily by specifying following arguments.
Argument Description

main Main plot title

xlab x-axis label

ylab y-axis label

plot(pressure,
main = "Vapor Pressure of Mercury",
xlab = "Temperature (deg C)",
ylab = "Pressure (mm of Hg)")
The Axes Label Style
By specifying the las (label style) argument, you can change the axes label style.
This changes the orientation angle of the labels.

Value Description

0 The default, parallel to the axis

1 Always horizontal

2 Perpendicular to the axis

3 Always vertical

For example, to change the axis style to have all the axes text horizontal,
use las=1

plot(pressure, las = 1)
The Box Type
Specify the bty (box type) argument to change the type of box round the plot area.

Value Description

“o” (default) Draws a complete rectangle around the plot.

“n” Draws nothing around the plot.

“l”, “7”, “c”, “u”, or “]” Draws a shape around the plot area.

# Remove the box round the plot


plot(pressure, bty="n")

Add a Grid
The plot() function does not automatically draw a grid. However, it is helpful to the
viewer for some plots. Call the grid() function to draw the grid once you call
the plot().

plot(pressure)
grid()

Add a Legend
You can include a legend to your plot – a little box that decodes the graphic for
the viewer. Call the legend() function, once you call the plot().

# Add a legend to the top left corner


plot(pressure, col="red", pch=19)
points(pressure$temperature/2, pressure$pressure,col="blue", pch=17)
legend("topleft", c("line 1","line 2"), pch=c(19,17), col=c("red","blue"))

The position of the legend can be specified using the following keywords :
“bottomright”, “bottom”, “bottomleft”, “left”, “topleft”, “top”, “topright”, “right” and
“center”.

The effect of using each of these keywords is shown below.

Add Points to a Plot


You can add points to a plot with the points() function.
For example, let’s create a subset of pressure containing temperatures greater
than 200 °C and add these points to the plot.

plot(pressure, col = "red")


points(pressure[pressure$temperature > 200, ], col = "red", pch = 19)

Add Lines to a Plot


You can add lines to a plot in a very similar way to adding points, except that you
use the lines() function to achieve this.

plot(pressure)
lines(pressure$temperature/2, pressure$pressure)
You can change the line type using lty argument; and the line width
using lwd argument.

# Change the line type and line width


plot(pressure)
lines(pressure$temperature/2, pressure$pressure, lwd=2, lty=3)

Here’s a list of line types you can use.

There’s another function called abline() which allows you to draw horizontal,


vertical, or sloped lines.

# Draw a dotted horizontal line at 247 and vertical line at 300


plot(pressure)
abline(h= 247, v=300, col="red", lty=2)

Label Data Points


Use the text() function to add text labels at any position on the plot.

The position of the text is specified by the pos argument. Values of 1, 2, 3 and 4,


respectively places the text below, to the left of, above and to the right of the
specified coordinates.

# Add text labels above the coordinates


plot(pressure, pch=19, col="red")
text(pressure, labels=pressure$pressure, cex=0.7, pos=3, col="blue")
Set Axis Limits
By default, the plot() function works out the best size and scale of each axis to fit
the plotting area. However, you can set the limits of each axis quite easily
using xlim and ylim arguments.

# Change the axis limits so that the x-axis and y-axis ranges from 0 to 500
plot(pressure, ylim=c(0,500), xlim=c(0,500))

Display Multiple Plots on a Single Page


By using the mfrow graphics parameter, you can display multiple plots on the
same graphics page.

To use this parameter, you need to pass a two-element vector, specifying the
number of rows and columns. Then fill each cell in the matrix by repeatedly
calling plot.

For example, mfrow=c(1, 2) creates two side by side plots.

par(mfrow = c(1, 2))


plot(cars, main="Speed vs Distance", col="red")
plot(mtcars$mpg, mtcars$hp, main="HP vs MPG", col="blue")

Once your plot is complete, you need to reset your par() options. Otherwise, all
your subsequent plots will appear side by side.

# Reset the mfrow parameter


par(mfrow = c(1,1))

Save a Plot to an Image File


To save a plot to an image file, you have to do three things in sequence:

 Call a function to open a new graphics file, such as png(), jpg() or pdf().


 Call plot() to generate the graphics image.

 Call dev.off() to close the graphics file.

# Save a plot as a png file


png(filename="myPlot.png", width=648, height=432)
plot(pressure, col="slateblue1", pch=19, type="b",
main = "Vapor Pressure of Mercury",
xlab = "Temperature (deg C)",
ylab = "Pressure (mm of Hg)")
dev.off()

myPlot.png

You might also like