You are on page 1of 82

“Business Analytics”

Today Objective

Handling Pivot table In R ,Case exercise (pivot-table & espn), Special


method of Pivot Using melt function ,Power Pivot (R) and , Python

next session (2-3 sessions)


Unsupervised Learning
Retail Analytics ,Market basket Analysis (case study),Then
Association mining, Rule generation

Indian Institute of Management (IIM),Rohtak


Pivot Table In R

Indian Institute of Management (IIM),Rohtak


Import Data (input.csv)

 data <- read.csv("C:/Users/admin/Desktop/input.csv")

• data <- read.csv("input.csv")

data <- read.csv(file.choose(), header=T)

Indian Institute of Management (IIM),Rohtak


Excel file
library("readxl")
my_data <-
read_excel("my_file.xls")
my_data <- read_excel(file.choose())
# Specify sheet by its name
my_data <- read_excel("my_file.xlsx", sheet =
"data")
# Specify sheet by its index
my_data <- read_excel("my_file.xlsx", sheet = 2)
Indian Institute of Management (IIM),Rohtak
Import from Desktop
The function read.table() can then be used to read the data frame
directly

> airqual <- read.table("C:/Desktop/airquality.txt")

Similarly, to read .csv files the read.csv() function can be used to read
in the data frame directly

> airqual <- read.csv("C:/Desktop/airquality.csv")

excle:
library("readxl")
airqual <-
read_excel("C:\\Users\\admin\\Desktop\\BA_Gradesheet.xl
sx") Indian Institute of Management (IIM),Rohtak
Pivot Table

espn = read.csv("espn.CSV",header=TRUE)

or espn = read.csv("espn.CSV")

rpivotTable(espn)

Indian Institute of Management (IIM),Rohtak


Pivot Table
>library(rpivotTable)
espn = read.csv("espn.CSV",header=TRUE)
rpivotTable (objectname, rows=“ ",col=“ ",aggregatorName=“ ",
vals=“`", rendererName="Table")

rpivotTable(espn, rows="Gender",
col="Location",
aggregatorName="Average",
vals="Income",
rendererName="Table")
Indian Institute of Management (IIM),Rohtak
Pivot Table

Indian Institute of Management (IIM),Rohtak


Pivot Table

Case :espn.csv
Q1.Analyzing the gender of subscribers in
terms of percentage?

library(rpivotTable)
espn = read.csv("espn.CSV")
rpivotTable(espn, row="Gender",
aggregatorName="Count as Fraction of
Total",rendererName="Table")

Indian Institute of Management (IIM),Rohtak


Pivot Table

Indian Institute of Management (IIM),Rohtak


Case :espn.csv

Q2. Analyzing the age of subscribers in the


percentage.
rpivotTable(espn,
row="Age",
aggregatorName="Count as
Fraction of Total)

Indian Institute of Management (IIM),Rohtak


Case :espn.csv

Q3.Find out total number of persons in each


locations with gender information?
rpivotTable(espn, row="Location", col
="Gender", aggregatorName="Count",
rendererName="Table")

Indian Institute of Management (IIM),Rohtak


Pivot Table
Q4. Analyzing subscribers locations
in terms of percentage
library(devtools)
library(rpivotTable)
rpivotTable(espn, col=“Location",
aggregatorName="Count as Fraction of
Total",rendererName="Table")

Indian Institute of Management (IIM),Rohtak


Q5. Analyzing between age and income with their
location(in percentage)?(income in column ,age in
row)
library(devtools)
library(rpivotTable)
rpivotTable(espn, col="Income",
rows="Age"
,aggregatorName="Count as Fraction
of Total", vals="Location",
rendererName="Table")
Indian Institute of Management (IIM),Rohtak
Pivot Table

Indian Institute of Management (IIM),Rohtak


Pivot Table
Case 2(pivot-table.csv)

Q1.To analyze the product sales on a given date


Indian Institute of Management (IIM),Rohtak
Pivot Table
Q1.To analyze which one is main product?
pivot = read.csv("pivot-table.csv)
rpivotTable(pivot, rows="Product",
col="Catgeory")

Indian Institute of Management (IIM),Rohtak


Pivot Table
Q1.To analyze the which one is main product?
pivot = read.csv("pivot-table.csv")

rpivotTable(pivot, rows="Product", col="Catgeory",


aggregatorName="Sum",vals="Amount")

Indian Institute of Management (IIM),Rohtak


Pivot Table

Q2. To analyze number of orders country-wise


with product
rpivotTable(pivot, rows="Country",
col="Product")
rpivotTable(pivot, rows="Country", col="Product",
aggregatorName="Count",vals="Amount",
rendererName="Table")

Indian Institute of Management (IIM),Rohtak


Pivot Table

Q3. To find out difference between veg and


fruit?

rpivotTable(pivot, row="Product", col="Category")

Indian Institute of Management (IIM),Rohtak


Pivot Table

Q3. To find out difference between veg and


fruit?
rpivotTable(pivot, row="Product", col="Category",
aggregatorName="Sum",vals="Amount")

Indian Institute of Management (IIM),Rohtak


Pivot Table
Q4.Country-wise % distribution of products
rpivotTable(pivot, row="Country",
aggregatorName="Sum as Fraction of
Total",vals="Amount")

Indian Institute of Management (IIM),Rohtak


Pivot Table
Q5.Price difference between Australian Banana and Canadian
Apple
rpivotTable(pivot, col="Product", rows="Country",
aggregatorName="Sum",vals="Amount")

Indian Institute of Management (IIM),Rohtak


Viewing the imported data
Now import first pivot-table.xlsx file in R environment

library("readxl")

pivot_table <- read_excel("pivot-table.xlsx")

pivot_table <- read_excel(file.choose())

pivot_table <-read_excel("C:\\Users\\admin\\Desktop\\pivot-
table.xlsx")

Indian Institute of Management (IIM),Rohtak


Viewing the imported data

View the pivot table in


the source editor

Indian Institute of Management (IIM),Rohtak


reshape Package
• We complete the task of pivot table by using reshape package
which contains melt and cast functions
• Go to Packages tab of output/packages pane in Rstudio
• Click on Install. Type reshape in Packages and click Install

Indian Institute of Management (IIM),Rohtak


Install Completion

Indian Institute of Management (IIM),Rohtak


Load the Package
Click on the reshape package check box in the
packages tab

Indian Institute of Management (IIM),Rohtak


melt function: Example from our dataset
• The dataset contains 6 columns
• The combination of all the columns except
Amount identify each order
• These are all non-numeric and there are no
calculations we can do on them except, maybe,
counting their frequency
• Using melt‘s terminology, Order ID, Product,
Category, Date, and Country are id variables,
while Amount which is a numeric and which is the
one we would like to sum up in our pivot table, is
a measure
Indian Institute of Management (IIM),Rohtak
Dataset

ID

Measure
Indian Institute of Management (IIM),Rohtak
Melting the data
• When melting your data, you can indicate multiple id
variables as well as multiple measure variables
• melt transforms a data frame from the original format
to a so called long format, where all the observed
variables (called measures) appear, together with their
respective value, in two adjacent columns
named variable and value
• Each row of this new data format is identified by a
unique combination of the id variables, also part of
the original data frame
• The id variable will appear as is in the resulting
melted format, while the measure variable will be
stacked in the variable column with their respective
value in the value column
Indian Institute of Management (IIM),Rohtak
Melting the data
• pivot_table.m <- melt(pivot_table, id=c(1:3,5:6), measure=c(4))
• pivot_table.m

Melting the data into ids and measures

Indian Institute of Management (IIM),Rohtak


Casting the data
• Casting is done to build the pivot table
• cast requires us to indicate, beside a
reference to the melted data, how we want
to re-aggregate the values
• Problem statement: Want to see total
amount for each product
• The basic formula: what do you want as
rows ~ variable (summarization)

Indian Institute of Management (IIM),Rohtak


Casting the data
Question: Which is the most exported product?
• pivot_table.c <- cast(pivot_table.m, Product ~ variable,
sum)
• pivot_table.c

Product as row and variable (or Amount) as column

Aggregating the Amount by product

Indian Institute of Management (IIM),Rohtak


Adding the grand total row
Question: What is the total amount of exports?
• pivot_table.c <- cast(pivot_table.m, Product ~
variable, sum, margins=("grand_row"))
• pivot_table.c

Adding the grand total row

Indian Institute of Management (IIM),Rohtak


Filter by Country
Question: Which product do we export most to United State?
• pivot_table.c <- cast(pivot_table.m, Product ~ variable, sum,
margins=c("grand_row"), subset=(Country=="United States"))
• pivot_table.c

Filter criteria
Which product we export the most to United States?

Indian Institute of Management (IIM),Rohtak


Filter with the average amount
Question: What the average export amount to United States?
• pivot_table.c <-cast(pivot_table.m, Product ~ variable,
margins = c("grand_row"), mean, subset=(Country==
"United States") )
• pivot_table.c

The mean as the aggregation


function gives the mean of the
products

Indian Institute of Management (IIM),Rohtak


Filter with the average amount

Question: What the average export amount to United States


for both category?
pivot_table.c <-
cast(pivot_table.m,Category ~
variable,margins = c("grand_row"), mean,
subset=(Country== "United States") ) >
pivot_table.c
Category Amount
1 Fruit 4213.595
2 Vegetables 6010.800
3 (all) 4686.544

>

Indian Institute of Management (IIM),Rohtak


Question: Category wise total and grand total

pivot_table.c <-cast(pivot_table.m,Category
~ variable,sum,margins = c("grand_row"))
pivot_table.c

Category Amount
1 Fruit 693069
2 Vegetables 336665
3 (all) 1029734

Indian Institute of Management (IIM),Rohtak


Percentage Analysis of each Country
Question: What are the export percentage country wise?
• pivot_table.c <- cast(pivot_table.m, Country ~ variable, sum)
• pivot_table.c
• pivot_table.c$Amount_pct <- pivot_table.c$Amount /
sum(pivot_table.c$Amount) #Calculates the percentage of
amounts
• pivot_table.c

Analyzes percentage
wise amount country
wise

Indian Institute of Management (IIM),Rohtak


Counting number of the exports by products
Question: What is the frequency of export to different countries?
• pivot_table.c <- cast(pivot_table.m, Product ~ variable,
margins=c("grand_row"))
• pivot_table.c

When the aggregation function is


omitted, it takes count or length (or
count) by default

Indian Institute of Management (IIM),Rohtak


Multi-level Pivot Table
Question:What are the exports category and country wise total?
• pivot_table.c <-cast(pivot_table.m, Category+Country ~ variable,
margins = c("grand_row"), sum)
• pivot_table.c

Category wise analysis with Country


details

pivot_table.c <-cast(pivot_table.m, Category+Country+Product ~


variable, margins = c("grand_row"), sum) Indian Institute of Management (IIM),Rohtak
Multi-level Pivot Table
Question: Product wise analysis with country ?
• pivot_table.c <-cast(pivot_table.m, Product+Country ~
variable, margins = c("grand_row"), sum)
• pivot_table.c

Product wise analysis with Country


details

Indian Institute of Management (IIM),Rohtak


Two-dimensional Pivot Table

Question: Compare the amount of Apple export to Australia and


Canada
• pivot_table$value <- pivot_table$Amount #This helps creating a
two-dimensional pivot table. The value column gives the amounts
between each pair of variables
• cast(pivot_table, Country ~ Product, sum) # Analyzes the
amounts between each pair of countries and products

Indian Institute of Management (IIM),Rohtak


Two-dimensional Pivot Table
Question: Compare the number of Apple exports to Australia and
Canada
• cast(pivot_table, Country ~ Product, length) # Analyzes the
number of exports (count) between each pair of countries and
products

Indian Institute of Management (IIM),Rohtak


Two-dimensional Pivot Table
Question: Compare the average amount of Apple exports to
Australia and Canada
• cast(pivot_table, Country ~ Product, mean) # Analyzes the
means (average amounts) between each pair of countries and
products

NaN (Not a Number) is because


some of the pairs have zero
exports between them
Indian Institute of Management (IIM),Rohtak
Exporting the pivot table to xls

Check for the xls with name “pivot_countryProduct” in the


Documents folder (working directory)

Indian Institute of Management (IIM),Rohtak


Case(espn.xlsx)
• Q1. Analyze the number of males and females location
wise.
• Q2. Analyze the average income gender wise.
• Q3. Analyze the average income gender wise and
location wise.
• Q4. Analyze subscribers locations in terms of percentage
• Q5. Analyze the average age of subscribers gender wise
and location wise.
• Q6. Create a two-dimensional pivot table of average
income with Gender and Location
• Q7.Analyze the gender of subscribers in terms of
percentage
Indian Institute of Management (IIM),Rohtak
Q1: Analyze the number of males and females location wise

• espn.m<-melt(espn, id=c(2:4), measure=c(1))


• espn.c <- cast(espn.m, Gender+ Location ~ variable,
length)
• espn.c

Indian Institute of Management (IIM),Rohtak


Q2: Analyze the average income gender wise

• espn.m<-melt(espn, id=c(1:2,4), measure=c(3))


• espn.c <- cast(espn.m,Gender~variable,mean)
• espn.c

Indian Institute of Management (IIM),Rohtak


Q3. Analyze the average income gender wise and location wise.
• espn.m<-melt(espn, id=c(1:2,4), measure=c(3))
• espn.c <- cast(espn.m, Gender + Location ~ variable, mean)
• espn.c

Indian Institute of Management (IIM),Rohtak


Q4. Analyzing subscribers locations in terms of percentage
• espn.m<-melt(espn, id=c(1:3), measure=c(4))
• espn.c <-cast(espn.m, variable~value, length)
• espn.c$rural_pct<- espn.c$rural / sum(espn.c$rural,espn.c$suburban,
espn.c$urban)
• espn.c$suburban_pct <- espn.c$suburban /
sum(espn.c$rural,espn.c$suburban, espn.c$urban)
• espn.c$urban_pct <- espn.c$urban/ sum(espn.c$rural,espn.c$suburban,
espn.c$urban)
• espn.c

Indian Institute of Management (IIM),Rohtak


Q5. Analyze the average age of subscribers gender wise and location wise
• espn.m<-melt(espn, id=c(2:4), measure=c(1))
• espn.c <- cast (espn.m,Gender +Location ~ variable, mean)
• espn.c

Indian Institute of Management (IIM),Rohtak


Q6.Create a two-dimensional pivot table of
average income with Gender and Location
• espn.m<-melt(espn, id=c(1,2,4), measure=c(3))
• espn.c <- cast(espn.m,Gender~Location,mean)
• espn.c

Indian Institute of Management (IIM),Rohtak


Q7: Analyzing the gender of subscribers in terms of percentage
Upload the espn.xlsx in R
espn.m<-melt(espn, id=c(1,3:4), measure=c(2)) #We need to count the Gender
column, hence it is a measure
espn.c <-cast(espn.m, variable~value, length) #It will show the count of males and
females
espn.c$f_pct<- espn.c$f/sum(espn.c$f,espn.c$m) #It will calculate the female
subscribers in percentage
espn.c$m_pct<- espn.c$m/sum(espn.c$f,espn.c$m) #It will calculate the female
subscribers in percentage
espn.c

Indian Institute of Management (IIM),Rohtak


More than one Table

Pivoting from multiple excel data files


• Step 1: Combine the multiple data files of excel in R:

Indian Institute of Management (IIM),Rohtak


More than one Table

Install the library ‘base’ in R

Indian Institute of Management (IIM),Rohtak


More than one Table

Usage of read.excel file

Indian Institute of Management (IIM),Rohtak


More than one Table
You have three different table
1.Data.xlsx

2. Dept.xlsx

3.Faculty.xlsx

Indian Institute of Management (IIM),Rohtak


More than one Table

Questions:
Calculate Department wise Salary total?
Calculate Level wise salary total?
Calculate Department wise and Level
wise Salary total?
Calculate travel expenses by each
department?
MANY MORE …….

Indian Institute of Management (IIM),Rohtak


Viewing the imported data
Now import first pivot-table.xlsx file in R environment

library("readxl")

data <- read_excel(“data.xlsx")

data <- read_excel(file.choose())

data <-read_excel("C:\\Users\\admin\\Desktop\\data.xlsx")

Indian Institute of Management (IIM),Rohtak


More than one Table

IN CONSOLE
>library(readxl)

Indian Institute of Management (IIM),Rohtak


More than one Table

Indian Institute of Management (IIM),Rohtak


More than one Table

Indian Institute of Management (IIM),Rohtak


More than one Table
The same now repeating for Dept.xlsx table

Indian Institute of Management (IIM),Rohtak


More than one Table
Same now repeat for Faculty.xlsx table

Indian Institute of Management (IIM),Rohtak


More than one Table

Now going for merging all the table to create


single table for analysis
Procedure :
First we merge Data table and Dept. table
,then resultant table will merge Faculty table
m1=merge(Data,Dept,by="Dept. Code")

Now m1 you are going to write in another file name i.e tab1.csv

write.csv(m1, file = "tab1.csv")


Indian Institute of Management (IIM),Rohtak
More than one Table

Indian Institute of Management (IIM),Rohtak


More than one Table

Resultant table(m1) will merge Faculty


table

m3=merge(m1,Faculty,by="Faculty
Code") /*correlating third table in
resultant table*/

write.csv(m3, file = "tab3.csv")

Indian Institute of Management (IIM),Rohtak


More than one Table

Indian Institute of Management (IIM),Rohtak


More than one Table

Questions:
Calculate Department wise Salary total?
Calculate Level wise salary total?
Calculate Department wise and Level
wise Salary total?
Calculate travel expenses by each
department?
MANY MORE …….

Indian Institute of Management (IIM),Rohtak


More than one Table

Q1.Calculate Department wise total salary


library(devtools)
library(rpivotTable)
pivot2=read.csv("tab3.csv",header=TRUE)

rpivotTable(pivot2, rows="Name",
aggregatorName="Sum", vals="Salary",
rendererName="Table")

Indian Institute of Management (IIM),Rohtak


More than one Table

Indian Institute of Management (IIM),Rohtak


More than one Table

Q2.Calculate Level wise salary total?


rpivotTable(pivot2, rows="Level",
aggregatorName="Sum", vals="Salary",
rendererName="Table")

Indian Institute of Management (IIM),Rohtak


More than one Table

Q3.Calculate Department wise and Level wise Salary


total?

rpivotTable(pivot2,
rows="Name", col
="Level",aggregatorName="Sum"
, vals="Salary",
rendererName="Table")

Indian Institute of Management (IIM),Rohtak


More than one Table

Indian Institute of Management (IIM),Rohtak


More than one Table

Q4.Calculate travel expenses by each


department?
rpivotTable(pivot2, rows="Name", aggregatorName="Sum",
vals="Travel.expenses", rendererName="Table")

Indian Institute of Management (IIM),Rohtak


More than one Table

One file is in notepad and another is excel


storesales.txt and states.xlsx

Q: State wise Product Count

Indian Institute of Management (IIM),Rohtak


More than one Table

for importing txt file

data2 <- read.delim(file.choose(), header=T)

View(data2)
Indian Institute of Management (IIM),Rohtak
More than one Table

Now import xlsx file as per previous practices

states <- read_excel("C:/Users/PRS/Desktop/merger/states.xlsx")


> View(states)

m4=merge(data2,states,by="Store")
write.csv(m4, file = "tab5.csv")
library(rpivotTable)
pivot3=read.csv("tab5.csv",header=TRUE)
rpivotTable(pivot3, rows="State", aggregatorName="Sum",
vals="Units", rendererName="Table")
Indian Institute of Management (IIM),Rohtak
More than one Table

Indian Institute of Management (IIM),Rohtak


Thank you !!!
Indian Institute of Management (IIM),Rohtak

You might also like