You are on page 1of 13

Tecniche di Data Mining

R Introduction part 2

Ing. Danilo Lofaro

Ing. Danilo Lofaro Tecniche di Data Mining 1 / 13


Ing. Danilo Lofaro Tecniche di Data Mining 2 / 13
Create Data Frames

Dfrm <- data.frame(WC = Wingcrd, TS = Tarsus,HD = Head, W = Wt)


Dfrm

Dfrm <- data.frame(WC = Wingcrd, TS = Tarsus, HD = Head,


W = Wt, log_WC = log(Wingcrd))
Dfrm

Ing. Danilo Lofaro Tecniche di Data Mining 3 / 13


Remove vars

rm(Wt)

Ing. Danilo Lofaro Tecniche di Data Mining 4 / 13


Exercise 3

We continue with the Deer from Exercises 2. Make a data frame that contains all
the data presented in the table in Exercise 1. Suppose that you decide to square
root transform the length data. Add the square root transformed data to the data
frame.

Ing. Danilo Lofaro Tecniche di Data Mining 5 / 13


Inport TXT Data

Squid <- read.table(file = "squidGSI.txt",header = TRUE)

Ing. Danilo Lofaro Tecniche di Data Mining 6 / 13


CSV files

Squid_CSV <- read.csv("squid.csv", sep=";",header = T)

Ing. Danilo Lofaro Tecniche di Data Mining 7 / 13


Exercise 4

The file Deer.xls contains the deer data discussed in Exercise 1, but includes all
animals. Export the data in the Excel file to an ascii or csv file, and import it into R.

Ing. Danilo Lofaro Tecniche di Data Mining 8 / 13


Read the data

head(Squid)
names(Squid)
str(Squid)

Squid2 <- read.table(file = "squidGSI.txt", dec = ",",


header = TRUE)
str(Squid2) # Error

Ing. Danilo Lofaro Tecniche di Data Mining 9 / 13


formula and data argument

M1 <- lm(GSI ~ factor(Location) + factor(Year), data = Squid)

mean(GSI, data = Squid) # Error

boxplot(GSI ~ factor(Location), data = Squid)

boxplot(GSI, data = Squid) # Error

Ing. Danilo Lofaro Tecniche di Data Mining 10 / 13


The $ Sign

Squid$GSI
Squid[,6]
Squid[,"GSI"]
mean(Squid$GSI)

Ing. Danilo Lofaro Tecniche di Data Mining 11 / 13


Subsetting data (cont’d)

Squid$Sex
unique(Squid$Sex)

SquidM <- Squid[Squid$Sex == 1, ]


head(SquidM)
SquidF <- Squid[Squid$Sex == 2, ]
SquidF

Squid123 <- Squid[Squid$Location == 1 |


Squid$Location == 2 |
Squid$Location == 3, ]
Squid123 <- Squid[Squid$Location != 4, ]
Squid123 <- Squid[Squid$Location < 4, ]
Squid123 <- Squid[Squid$Location <= 3, ]
Squid123 <- Squid[Squid$Location >= 1 & Squid$Location <= 3, ]

SquidM.1 <- Squid[Squid$Sex == 1 & Squid$Location == 1,]


Ing. Danilo Lofaro Tecniche di Data Mining 12 / 13
Exercise 5

The file BirdFlu.xls contains the annual number of confirmed cases of human Avian
Influenza A/(H5N1) for several countries reported to the World Health
Organization (WHO). Prepare the spreadsheet and import these data into R. If you
are a non-Windows user, start with the file BirdFlu.txt. Note that you will need to
adjust the column names and some of the country names. Use the names and str
command in R to view the data. Print the number of bird flu cases in 2003.
What is the total number of bird flu cases in 2003 and in 2005?
Which country has had the most cases? Which country has had the least bird
flu deaths?
What is the total number of bird flu cases per country? What is the total
number of cases per year?

Ing. Danilo Lofaro Tecniche di Data Mining 13 / 13

You might also like