You are on page 1of 25

ISOM 3390 Business Programming in R

Week 3 R packages and scripts


19, September 2022

Instructor Hyungsoo Lim

ISOM 3390 Business Programming in R


Plan for today

• data.table basics
• data.table
✓ Rows
✓ Columns
✓ Groups

• Exercise with two types of empirical datasets

ISOM 3390 Business Programming in R 2


Data.table in R

 Data.table is widely used

ISOM 3390 Business Programming in R 3


Data.table

• https://www.rstudio.com/resources/cheatsheets/

• Cheat sheet for data.table

ISOM 3390 Business Programming in R 4


Data.table basics

ISOM 3390 Business Programming in R 5


Data.table

 Create a data.table
• Use data.table function

ISOM 3390 Business Programming in R 6


Data.table

 Create a data.table
• Convert a data.frame to a data.table
✓ Using as.data.table() or setDT() or
✓ Check whether an object is data.frame (or data.table) or not
• Is.data.frame() or is.data.table()

ISOM 3390 Business Programming in R 7


Data.table

 Subset of rows (refer to 2nd Lab session)


• Simply use column names instead of object$column names
• No more comma (,)

ISOM 3390 Business Programming in R 8


Data.table

 Subset of rows (refer to 2nd Lab session)


• Dt[,.N] reports the number of corresponding rows

ISOM 3390 Business Programming in R 9


Data.table

 Subset of rows

ISOM 3390 Business Programming in R 10


Data.table

 Subset of columns
• Several ways to extract subset of columns

ISOM 3390 Business Programming in R 11


Data.table

 Generate a new column


• Use “:=“

ISOM 3390 Business Programming in R 12


Data.table

 Generate new columns


• Use “`:=`“

ISOM 3390 Business Programming in R 13


Data.table

 Generate a new column


• Condition

ISOM 3390 Business Programming in R 14


Data.table

 Use group by

ISOM 3390 Business Programming in R 15


Data.table

 How to rename columns?


• Use setnames()

ISOM 3390 Business Programming in R 16


Data.table

 How to drop columns?


• Use “:=NULL”
• Use “`:=`“ for dropping multiple columns

ISOM 3390 Business Programming in R 17


Data.table

• Import ‘db_transaction.csv’
✓ transaction_id: each transaction’s unique id
✓ payment: payment method (e.g., cash, credit_card, debit_card, mobile, octopus)
✓ transaction_amount: amount of the transaction in HKD
✓ day, mon, year: transaction date, month, and year, respectively
✓ time: transaction time
✓ user_id: id of the user who made the transaction
✓ register: registration year of the user
✓ gender: gender of the user
✓ age: age of the user

ISOM 3390 Business Programming in R 18


Data.table

 How to make a tenure variable?

ISOM 3390 Business Programming in R 19


Data.table

 How to sort tr_db chronologically?

ISOM 3390 Business Programming in R 20


Data.table

 Chaining
• data.table[][][][]…
✓ Perform a sequence of data.table operations by chaining multiple “[]”

• What if we want to find users who have made transactions with five methods
✓ to investigate whether types of payment affect user’s purchasing behavior
✓ Each user have at least one transaction with every payment
• Cash
• Octopus
• Credit_card
• Debit_card (EPS)
• Mobile (e.g., Alipay, WeChat)

• Manual approach
✓ Just check every transactions
✓ But there are 30,000 rows…
ISOM 3390 Business Programming in R 21
Data.table

 Chaining
• Semi-manual approach
✓ check every transactions but use a loop (e.g., for, while; will be covered later)
✓ Pick a user and recursively check whether he/she has made transactions with five methods

ISOM 3390 Business Programming in R 22


Data.table

 Chaining
• Chaining approach
✓ Utilize by at user &payment level
✓ Utilize by at user level

ISOM 3390 Business Programming in R 23


Data.table

 Chaining
• Chaining approach
✓ Utilize by at user &payment level
✓ Utilize by at user level
✓ Just a line

ISOM 3390 Business Programming in R 24


Data.table

 Other check points


• Individual variables
✓ Should be uniquely identified

• Numbers can be numeric or character


✓ How to convert type of columns?
✓ Dt[,x:=as.numeric(x)]
✓ Dt[,y:=as.charcter(y)]

ISOM 3390 Business Programming in R 25

You might also like