Professional Documents
Culture Documents
The for- loop statement repeats the command to be executed on your data a specific number of times
that you set. For loops are useful if you need to repeat a manipulation or analysis on your data without
having to copy and run the same code and risk making mistakes. For example, if you have time series
data such as monthly climate variables (temperature, precipitation…etc) for a particular location for a 30
year period (number of rows=30*12=360) and you need to know the average for each year, you can do
that with a for-loop in R.
Next, you can also create a data set that you would like to populate with different values. For
example, you want your first column to be numbers 1:10, in the second column you want to fill with
values of the first multiplied by 10, and last adding both together.
First we create an empty data frame with 10 rows (nrow=10) and 3 columns (ncol=3)
dat=matrix(nrow=10,ncol=3)
dat=data.frame(dat)
for (i in 1:10){
for (j in 1:4){
dat1[i,j]<-i*j
}
}
fix(dat1)
First, save your R work space into a directory that you want to work with, start R and import BU.csv.
The code below creates a data table with number of rows in the original data divided by 12
nrow=(nrow(BU)/12) – since we want the summary for the year , and one less column than in the
original data (no need for the column month anymore) ncol=(ncol(BU)-1), then assigns the same
column names as in the original data for columns 1, 3:7 colnames(BU [,c(1,3:7)]) , and writes the
years in the first column paste(BU [(seq(1,nrow(BU),12)),1]) . Then it will subset the data for
each year dat2= BU [BU $Year==(1949+i),] and calculate the mean of inverse SOI, maximum of
Tmin, the minimum of Prec, the mean of cloud absence, and mean vapor pressure for each year and put
the values in different columns. This loop for instance is useful if you don’t care about what’s happening
every month but only want to compare years.
tbl<-matrix(nrow=(nrow(BU)/12),ncol=(ncol(BU)-1))
colnames(tbl)=colnames(BU[,c(1,3:7)])
tbl[,1]=paste(BU[(seq(1,nrow(BU),12)),1])
for (i in 1:nrow(tbl)){
dat2= BU [BU $Year==(1949+i),]
tbl[i,2]=mean(1/(dat2$SOI))
tbl[i,3]=max(dat2$Tmin)
tbl[i,4]=min(dat2$Prec)
tbl[i,5]=mean(100-(dat2$Cloud))
tbl[i,6]=mean(dat2$Hum)
}
But if you want to calculate the mean for all the variables for each year, you can do a nested loop.
In the loop below the i loop is for populating each row and the j loop if for populating all the
columns in each row. How would you change the code to get the average for each variable for
each of the months instead of the years?
average<-matrix(nrow=(nrow(BU)/12),ncol=(ncol(BU)-1))
colnames(average)=colnames(BU[,c(1,3:7)])
average[,1]=paste(BU[(seq(1,nrow(BU),12)),1])
for (i in 1:nrow(average)){
for (j in 1:(ncol(average)-1)){
dat2=BU[BU$Year==(1949+i),]
average[i,j+1]=mean(dat2[,j+2])
}
}