You are on page 1of 4

HINT

# : justification notes
Blue font : output from R
> : command

INTRODUCTION TO R

Save R script Run R script


Code Editor Environment
➢ Write R script ➢ All the variables are
➢ Edit R script listed here
➢ Save R script

R Console
➢ Write a command line here
➢ PLEASE BEWARE!!
• The command line will be
lost once close the RStudio

CREATE PROJECT
2 3
1

4 Create Folder Name 5

Click “Browse”
to set directory
IMPORTING DATA FROM EXCEL TO R
1 2

Copy the “BerlinR.xlsx” data and


paste into Berlin Study project

4
Create Data Name
CHECKING DATA
> names(data) #list all variables name
[1] "patientid" "date" "enddate" "ptage" "ptgender" "ptrace" "q1"
[8] "q2" "q3" "q4" "q5" "height" "weight" "sysbp"
[15] "diasbp"

> str(data) #display the dataset structure


Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 28 obs. of 15 variables:
$ patientid: num 1 1 2 3 4 5 6 7 8 9 ...
$ date : POSIXct, format: "2009-05-13" "2009-06-17" "2009-06-17"
"2009-06-17" ...
$ enddate : POSIXct, format: "2009-12-13" "2009-12-13" "2009-12-13"
"2009-12-13" ...
$ ptage : num 48 44 35 44 45 48 34 39 48 35 ...
$ ptgender : num 1 2 1 1 2 2 2 2 2 1 ...
$ ptrace : num 3 1 1 1 1 1 1 3 3 1 ...
$ q1 : num 1 1 1 1 1 1 1 1 1 1 ...
$ q2 : num 2 2 4 2 2 2 4 1 2 4 ...
$ q3 : num 1 2 1 2 1 1 1 1 1 1 ...
$ q4 : num 1 1 1 1 1 1 1 1 1 1 ...
$ q5 : num 1 4 1 4 4 3 3 5 1 1 ...
$ height : num 1.6 1.67 1.72 1.67 1.68 NA 1.65 1.68 1.6 1.72 ...
$ weight : num 77 74 128 74 78.5 NA 100 194 77 128 ...
$ sysbp : num 117 123 191 123 107 120 122 140 1 191 ...
$ diasbp : num 78 80 109 80 91 78 73 90 78 109 ...

> nrow(data) #no of data/sample


[1] 28

> ncol(data) #no of column/variables


[1] 15
DATA MANAGEMENT
A. Label the coded variables in our dataset
> table(data$ptgender)
1 2 3
17 9 2

> data$gender=factor(data$ptgender, #create new var i.e. gender


levels=c(1,2,3), #value labels for ptgender
#1=Male, 2=Female, 3=Missing
labels=c("Male","Female",NA))

> table(data$ptgender,data$gender)
Male Female <NA>
1 17 0 0
2 0 9 0
3 0 0 2

Exercise
Please label the ptrace, q1, q2, q3, q4 and q5.

B. Compute variables
> data$bmi=data$weight/(data$height^2)

C. Recode into different variables


> data$bmi_cat[data$bmi<=18.5]=1
> data$bmi_cat[data$bmi>=18.6 & data$bmi<=24.9]=2
> data$bmi_cat[data$bmi>=25.0 & data$bmi<=29.9]=3
> data$bmi_cat[data$bmi>=30.0]=4
> table(data$bmi_cat)
3 4
8 16

D. Calculate the number between two dates


> data$days=(data$enddate-data$date) #Divide 365.25 for unit in years

E. Extract duplicate cases


> data$patientid[duplicated(data$patientid)]
[1] 1 14

You might also like