Professional Documents
Culture Documents
By:
Monika Mishra
Nanjesh Ramesh
1
CIS-5270 BUSINESS INTELLIGENCE
Table of Contents
2 Data Set
3. Dataset details 4
4-5
4. Column details
3 Data Cleaning
2. Histogram 14-15
5. Correlation Matrix
20-21
2
CIS-5270 BUSINESS INTELLIGENCE
1. Introduction:
Superstores industry comprises of companies that operate by having large size spaces
which store and supply large amounts of goods. The superstore industry is comprised of
extensive stores that sell a typical product line of grocery items and merchandise
products, such as food, pharmaceuticals, apparel, games and toys, hobby items, furniture
and appliances. The analysis of such industry is of great importance as it gives insights
for the sales and profits of various products. Our analysis is based on a superstore dataset
for US country where the products are ordered between 2015 and 2018.
With this analysis, the Superstore can identify various aspects of the shopping pattern and
3
CIS-5270 BUSINESS INTELLIGENCE
DATA SET
https://data.world/stanke/sample-superstore-2018
The dataset provides information about the sales and profit from a US supermarket from
3. Dataset details:
Size 2.4 MB
Number of columns 21
Number of rows 9994
Original file format XLS
4. Column details:
4
CIS-5270 BUSINESS INTELLIGENCE
Country US
5
CIS-5270 BUSINESS INTELLIGENCE
DATA CLEANING
1. Renaming Column
Goal: The Colum name “CT” was not proper. The aim is to rename the column to “City”
Before
After
Code Used
6
CIS-5270 BUSINESS INTELLIGENCE
Full Screenshot
7
CIS-5270 BUSINESS INTELLIGENCE
Goal: The Column named “Country” needs to be removed as it contains only one value
“United States”
Before
After
8
CIS-5270 BUSINESS INTELLIGENCE
Code Used
Full Screenshot
9
CIS-5270 BUSINESS INTELLIGENCE
Goal: To duplicate the column “Order.Date” to “order” and then split “order” into month,
day and year
Before
No column after Profit
After
10
CIS-5270 BUSINESS INTELLIGENCE
Code Used
superstore$order<-superstore$Order.Date
library(tidyr)
superstore<-separate(superstore,order,c("month","day","year"),sep="/")
Full Screenshot
11
CIS-5270 BUSINESS INTELLIGENCE
Analysis
The above bar chart displays the total number of orders by region. It can be seen that the
Western region has the maximum order count (greater than 3000). The Western region is
followed by the Eastern region having an order count close to 3000. It is then followed by
the Central region with a count of around 2300. The least order has been placed by
12
CIS-5270 BUSINESS INTELLIGENCE
Code Used
+ xlab="Region", col="lightblue")
Full Screenshot
13
CIS-5270 BUSINESS INTELLIGENCE
Analysis
The above histogram chart shows the frequency distribution of the quantity ordered. The
maximum ordered quantity is 1 which is greater than 3000. It is then followed by 2, the
frequency for which is close to 2500. Generally speaking, the frequency count is
decreasing as the quantity ordered is increasing. The quantity ordered 14 has the least
frequency.
14
CIS-5270 BUSINESS INTELLIGENCE
Code Used
Ordered",
Full Screenshot
15
CIS-5270 BUSINESS INTELLIGENCE
Analysis
The above pie chart shows the percentage sales by category. There are three categories –
contributed maximum towards sales which is 36%. It is then followed “Furniture” which
16
CIS-5270 BUSINESS INTELLIGENCE
Code Used
> install.packages("dplyr")
> library("dplyr")
> library(magrittr)
> pct<-round(gd$Sales/sum(gd$Sales)*100)
> lbls<-paste(gd$Category,pct)
Full Screenshot
17
CIS-5270 BUSINESS INTELLIGENCE
Analysis
The above is a Tree Map which provides information about the sales and profit of various
product category and sub-category. The cell size is decided by the sales. The color
gradient describes the profit. It can be concluded from the above map that the sub-
category “Phones” under “Technology” has the highest sale. The sub-category
18
CIS-5270 BUSINESS INTELLIGENCE
Code Used
> install.packages("treemap")
> library(treemap)
"Profit",type="value",palette="RdYlGn",range=c(-20000,60000),mapping=c(-
c(15,10),align.labels = list(c("centre","centre"),c("left","top")))
Full Screenshot
19
CIS-5270 BUSINESS INTELLIGENCE
Analysis
This is a co-relation matrix chart which provide the co-relationship information about
various variables. The color gradient from Red to Blue describes the extent of co-
relationship among Sales, Quantity, Discount and Profit, red being the negative co-
relationship and blue being the positive co-relationship. It can be seen that “Sales” and
“Profit” are somewhat related. “Profit” and “Quantity” are also very weakly related.
20
CIS-5270 BUSINESS INTELLIGENCE
Code Used
> install.packages("corrplot")
> View(mydata)
> library(corrplot)
> mydata.cor
> corrplot(mydata.cor)
Full Screenshot
21
CIS-5270 BUSINESS INTELLIGENCE
6. What are the product types that have been ordered maximum times?
Analysis
Word clouds (also known as text clouds or tag clouds) work in a simple way: the more a
specific word appears in a source of textual data (such as a speech, blog post, or
database), the bigger and bolder it appears in the word cloud. In our case we want to
know what kind of products have been ordered frequently. Looking at the above word
cloud, it is clear product related to “Xerox” has been ordered the most. The product
related to binders, chairs and avery have also been ordered many times.
22
CIS-5270 BUSINESS INTELLIGENCE
Code Used
> install.packages("tm")
> install.packages("SnowballC")
> install.packages("wordcloud")
> install.packages("RColorBrewer")
> library(tm)
> library(SnowballC)
> library(RColorBrewer)
> library(wordcloud)
Full Screenshot
23
CIS-5270 BUSINESS INTELLIGENCE
1. Statistical Summary
75%.
Max. 22638.480 The highest value of the sales present in the table.
(Maximum)
24
CIS-5270 BUSINESS INTELLIGENCE
> setwd("~/Desktop/BI")
> superstore<-read.csv("superstore.csv")
> View(superstore)
> summary(superstore$Sales)
Result
Full Screenshot
25
CIS-5270 BUSINESS INTELLIGENCE
Question – What is the total sales for each year for a particular user provided state ?
Answer – As a solution to the above question, we created a user defined function, which
takes state name as input parameter and displays total sales by year for the provided state
The state name provided by the user is validated to check if the name is there in
superstore table or not. If not present, an error message is shown. If present, the line chart
Full Screenshot
26
CIS-5270 BUSINESS INTELLIGENCE
Code Screenshot
27
CIS-5270 BUSINESS INTELLIGENCE
Execution Screenshot
28
CIS-5270 BUSINESS INTELLIGENCE
Function Code
statesales<-function(inputstate)
{
# importing libraries
library(tidyr)
library(dplyr)
library(ggplot2)
state_name<-distinct(superstore, State)
if (isvalid==TRUE)
{
selected<-select(superstore, State, Sales, year)
filtered<- filter(selected,State==inputstate)
aggregated<-aggregate(filtered$Sales,by=list(filtered$year),sum)
print(aggregated)
else
29
CIS-5270 BUSINESS INTELLIGENCE
Execution Script
> setwd("~/Desktop/BI")
> source("sales.R")
> statesales("LA")
> statesales("California")
Group.1 x
1 15 91303.53
2 16 88443.84
3 17 131551.91
4 18 146388.34
30
CIS-5270 BUSINESS INTELLIGENCE
CODE SUMMARY
a. Renaming Column
superstore$order<-superstore$Order.Date
library(tidyr)
superstore<-separate(superstore,order,c("month","day","year"),sep="/")
31
CIS-5270 BUSINESS INTELLIGENCE
2. Visualization Codes
a. Bar Chart
+ xlab="Region", col="lightblue")
b. Histogram
Ordered",
c. Pie Chart
> install.packages("dplyr")
> library("dplyr")
> library(magrittr)
> pct<-round(gd$Sales/sum(gd$Sales)*100)
> lbls<-paste(gd$Category,pct)
32
CIS-5270 BUSINESS INTELLIGENCE
d. Tree Map
> install.packages("treemap")
> library(treemap)
"Profit",type="value",palette="RdYlGn",range=c(-20000,60000),mapping=c(-
c(15,10),align.labels = list(c("centre","centre"),c("left","top")))
e. Correlation Matrix
> install.packages("corrplot")
> View(mydata)
> library(corrplot)
> mydata.cor
> corrplot(mydata.cor)
33
CIS-5270 BUSINESS INTELLIGENCE
f. Word Cloud
> install.packages("tm")
> install.packages("SnowballC")
> install.packages("wordcloud")
> install.packages("RColorBrewer")
> library(tm)
> library(SnowballC)
> library(RColorBrewer)
> library(wordcloud)
+ colors=brewer.pal(8, "Dark2"))
> setwd("~/Desktop/BI")
> superstore<-read.csv("superstore.csv")
> View(superstore)
> summary(superstore$Sales)
34
CIS-5270 BUSINESS INTELLIGENCE
statesales<-function(inputstate)
{
# importing libraries
library(tidyr)
library(dplyr)
library(ggplot2)
state_name<-distinct(superstore, State)
if (isvalid==TRUE)
{
selected<-select(superstore, State, Sales, year)
filtered<- filter(selected,State==inputstate)
aggregated<-aggregate(filtered$Sales,by=list(filtered$year),sum)
print(aggregated)
35