You are on page 1of 8

HINT

# : justification notes
Blue font : output from R
> : command

DESCRIPTIVES STATISTICS
A. For Categorical Data (1-Way Frequency Table)
> freq.gender=table(data$gender)
> freq.gender
Male Female <NA>
17 9 2

> percent.gender=prop.table(freq.gender)*100
> percent.gender
Male Female <NA>
60.714286 32.142857 7.142857

B. For Categorical Data (2-Way Frequency Table)


> freq.gender.race=table(data$gender,data$race) #table(row,column)
> freq.gender.race
Malay Indian
Male 12 5
Female 6 3
<NA> 2 0

> percent.gender.race=prop.table(freq.gender.race,1)*100 #1=row %


#2=column %
> percent.gender.race
Malay Indian
Male 70.58824 29.41176
Female 66.66667 33.33333
<NA> 100.00000 0.00000
C. For Numerical Data (1-Way Frequency Table)
> install.packages("psych") #Installing “psych” package in RStudio
> library(psych) #Load “psych” package
> desc.age=describe(data$ptage,IQR = TRUE)
> desc.age
vars n mean sd median trimmed mad min max range skew kurtosis
X1 1 28 44.21 14.02 44 42.17 5.93 34 110 76 3.59 14.36
Se IQR
2.65 13

Checking distributions of numerical variables through skewness(skew) and kurtosis values.


Numerical data is normally distributed if skewness value is between -1 to +11 and kurtosis
value is between -3 to +32. Numerical data is not normally distributed if skewness or kurtosis
or both are out of required range. For normally distributed data, report as mean(sd), if not
normally distributed, report as median(IQR). Thus, for ptage is not normally distributed
because skewness and kurtosis are not in the normal range.

Exercise
Please describe the height, weight, bmi, sysbp and diasbp.

D. For Numerical Data (2-Way Frequency Table)


> install.packages("psych") #Installing “psych” package in RStudio
> library(psych) #Load “psych” package
> desc.age.gender=describeBy(data$ptage,data$gender,IQR = TRUE)
> desc.age.gender
Descriptive statistics by group
group: Male
vars n mean sd median trimmed mad min max range skew kurtosis
X1 1 17 45.12 17.51 44 41.53 7.41 34 110 76 2.92 8.14
se IQR
4.25 10
-------------------------------------------------------------------------
group: Female
vars n mean sd median trimmed mad min max range skew kurtosis
X1 1 9 41.67 6.18 44 41.67 5.93 34 48 14 -0.19 -1.94
se IQR
2.06 13
-------------------------------------------------------------------------
group: NA
vars n mean sd median trimmed mad min max range skew kurtosis se IQR
X1 1 2 48 0 48 48 0 48 48 0 NaN NaN 0 0

Exercise
Please describe the height, weight, bmi, sysbp and diasbp among male and female.

1 Bulmer, M. G. (1979), Principles of Statistics. NY:Dover Books on Mathematics.


2 Kevin P. Balanda and H.L. MacGillivray. “Kurtosis: A Critical Review”. The American Statistician 42:2 [May 1988], pp 111–119
DATA RESHAPING
A. Extract Rows and Columns
> percent.gender.race
Malay Indian
Male 70.58824 29.41176
Female 66.66667 33.33333
<NA> 100.00000 0.00000

> percent.gender.race[,] #All rows and all columns


Malay Indian
Male 70.58824 29.41176
Female 66.66667 33.33333
<NA> 100.00000 0.00000

> percent.gender.race[1,] #First row and all columns


Malay Indian
70.58824 29.41176

> percent.gender.race[,2] #All rows and second column


Male Female <NA>
29.41176 33.33333 0.00000

> percent.gender.race[1,2] #First row and second column


[1] 29.41176

> percent.gender.race[1:2,] #First to second rows and all columns


Malay Indian
Male 70.58824 29.41176
Female 66.66667 33.33333

> percent.gender.race[c(1,3),] #First and third rows and all columns


Malay Indian
Male 70.58824 29.41176
<NA> 100.00000 0.00000

> percent.gender.race[1,1:2] #First row and second to third column


Malay Indian
70.58824 29.41176

> percent.gender.race[1:2,1:2] #First to second rows and second to


third columns
Malay Indian
Male 70.58824 29.41176
Female 66.66667 33.33333

> percent.gender.race[,c(1,2)] # All rows and first and second columns


Malay Indian
Male 70.58824 29.41176
Female 66.66667 33.33333
<NA> 100.00000 0.00000
B. Combine R Objects by Rows
> freq.gender
> percent.gender
> n.gender=rbind(freq.gender,percent.gender)
> n.gender
Male Female <NA>
freq.gender 17.00000 9.00000 2.000000
percent.gender 60.71429 32.14286 7.142857

C. Combine R Objects by Columns


> freq.gender.race
> percent.gender.race
> n.gender.race=cbind(freq.gender.race,percent.gender.race)
> n.gender.race
Malay Indian Malay Indian
Male 12 5 70.58824 29.41176
Female 6 3 66.66667 33.33333
<NA> 2 0 100.00000 0.00000

> n.gender.race=cbind ("Malay.n"=freq.gender.race[,1],


"Malay.%"=percent.gender.race[,1],
"Indian.n"=freq.gender.race[,2],
"Indian.%"=percent.gender.race[,2])
> n.gender.race
Malay.n Malay.% Indian.n Indian.%
Male 12 70.58824 5 29.41176
Female 6 66.66667 3 33.33333
<NA> 2 100.00000 0 0.00000

C. Combine Mean(SD) or Median(IQR)


> desc.age
vars n mean sd median trimmed mad min max range skew kurtosis
X1 1 28 44.21 14.02 44 42.17 5.93 34 110 76 3.59 14.36
se IQR
2.65 13

> desc.age.meansd=cbind("Mean"=desc.age$mean,"SD"=desc.age$sd)
> desc.age.meansd
Mean SD
[1,] 44.21429 14.02473

> desc.age.medianiqr=cbind("Median"=desc.age$median,"IQR"=desc.age$IQR)
> desc.age.medianiqr
Median IQR
[1,] 44 13
D. Combine Mean(SD) or Median(IQR) by groups
> desc.age.gender
Descriptive statistics by group
group: Male
vars n mean sd median trimmed mad min max range skew kurtosis
X1 1 17 45.12 17.51 44 41.53 7.41 34 110 76 2.92 8.14
se IQR
4.25 10
-------------------------------------------------------------------------
group: Female
vars n mean sd median trimmed mad min max range skew kurtosis
X1 1 9 41.67 6.18 44 41.67 5.93 34 48 14 -0.19 -1.94
se IQR
2.06 13
-------------------------------------------------------------------------
group: NA
vars n mean sd median trimmed mad min max range skew kurtosis se IQR
X1 1 2 48 0 48 48 0 48 48 0 NaN NaN 0 0

> str(desc.age.gender)
List of 3
$ Male :Classes ‘psych’, ‘describe’ and 'data.frame': 1 obs. of 14
variables:
..$ vars : num 1
..$ n : num 17 #desc.age.gender$Male$n
..$ mean : num 45.1 #desc.age.gender$Male$mean
..$ sd : num 17.5 #desc.age.gender$Male$sd
..$ median : num 44 #desc.age.gender$Male$median
..$ trimmed : num 41.5
..$ mad : num 7.41
..$ min : num 34
..$ max : num 110
..$ range : num 76
..$ skew : num 2.92
..$ kurtosis: num 8.14
..$ se : num 4.25
..$ IQR : num 10 #desc.age.gender$Male$IQR
$ Female:Classes ‘psych’, ‘describe’ and 'data.frame': 1 obs. of 14
variables:
..$ vars : num 1
..$ n : num 9 #desc.age.gender$Female$n
..$ mean : num 41.7 #desc.age.gender$Female$mean
..$ sd : num 6.18 #desc.age.gender$Female$sd
..$ median : num 44 #desc.age.gender$Female$median
..$ trimmed : num 41.7
..$ mad : num 5.93
..$ min : num 34
..$ max : num 48
..$ range : num 14
..$ skew : num -0.19
..$ kurtosis: num -1.94
..$ se : num 2.06
..$ IQR : num 13 #desc.age.gender$Female$IQR
$ NA :Classes ‘psych’, ‘describe’ and 'data.frame': 1 obs. of 14
variables:
..$ vars : num 1
..$ n : num 2
..$ mean : num 48
..$ sd : num 0
..$ median : num 48
..$ trimmed : num 48
..$ mad : num 0
..$ min : num 48
..$ max : num 48
..$ range : num 0
..$ skew : num NaN
..$ kurtosis: num NaN
..$ se : num 0
..$ IQR : num 0
- attr(*, "dim")= int 3
- attr(*, "dimnames")=List of 1
..$ group: chr [1:3] "Male" "Female" NA
- attr(*, "call")= language by.default(data = x, INDICES = group, FUN =
describe, type = type, IQR = TRUE)
- attr(*, "class")= chr [1:2] "psych" "describeBy"

> desc.age.gender$Male
vars n mean sd median trimmed mad min max range skew kurtosis
X1 1 17 45.12 17.51 44 41.53 7.41 34 110 76 2.92 8.14
se IQR
4.25 10

> desc.age.gender$Female
vars n mean sd median trimmed mad min max range skew kurtosis
X1 1 9 41.67 6.18 44 41.67 5.93 34 48 14 -0.19 -1.94
se IQR
2.06 13

> desc.age.male.meansd=cbind("Mean"=desc.age.gender$Male$mean,
"SD"=desc.age.gender$Male$sd)
> desc.age.male.meansd
Mean SD
[1,] 45.11765 17.50672

> desc.age.female.meansd=cbind("Mean"=desc.age.gender$Female$mean,
"SD"=desc.age.gender$Female$sd)
> desc.age.female.meansd
Mean SD
[1,] 41.66667 6.184658

> desc.age.gender.meansd=rbind(desc.age.male.meansd,
desc.age.female.meansd)
> desc.age.gender.meansd
Mean SD
[1,] 45.11765 17.506721
[2,] 41.66667 6.184658
> desc.age.male.medianiqr=
cbind("Median"=desc.age.gender$Male$median,
"IQR"=desc.age.gender$Male$IQR)
> desc.age.male.medianiqr
Median IQR
[1,] 44 10

> desc.age.female.medianiqr=
cbind("Median"=desc.age.gender$Female$median,
"IQR"=desc.age.gender$Female$IQR)
> desc.age.female.medianiqr
Median IQR
[1,] 44 13

> desc.age.gender.medianiqr=
rbind(desc.age.male.medianiqr,
desc.age.female.medianiqr)
> desc.age.gender.medianiqr
Median IQR
[1,] 44 10
[2,] 44 13

E. Rename R Objects by Rows


> rownames(desc.age.gender.meansd)=c("Male","Female")
> desc.age.gender.meansd
Mean SD
Male 45.11765 17.506721
Female 41.66667 6.184658

F. Rename R Objects by Columns


> colnames(desc.age.gender.medianiqr)=c("Median","Interquartile Range")
> desc.age.gender.medianiqr
Median Interquartile Range
[1,] 44 10
[2,] 44 13
EXPORTING RESULTS
> write.table(desc.age.gender.meansd, #Object to be export
"desc.age.gender.meansd.txt", #Name file in .txt format
sep="\t", col.names = NA)

IMPORTING RESULTS USING MICROSOFT OFFICE EXCEL


1. Open Microsoft Office Excel >> Click on Blank workbook
2. Go to File >> Open >> Browse >> Desktop
3. Double click on Berlin Study folder

2
Double click

1
Change to All Files 3
Click

4 5
Click Click

You might also like