You are on page 1of 10

Executive Summary

1. Introduction

Big data is determined as large information sets where complex information is transformed to

gain new insights into data that are applied for business activities to make effective decisions. In

retail industry, big data analytics and tools of business intelligence are together using on selected

data set to generate the useful insights to provide support to the big data strategy. It needs to

identify the big data technologies used for collection of data, storage, processing, transformation,

and analysis, query languages, data modeling, and big data architecture solution to facilitate

required support to the identified use cases in retail industry [ CITATION Eri172 \l 1033 ] . The

executive people working in retail business organizations are facing problems in adopting and

applying the big data tools to support the business even though they have huge experience and

knowledge on the business. In the present report, big data strategy document is developed on the

big data use cases in the retail industry, critical analysis of big data technologies, use of the big

data tools on the data set, critical analysis on the output, and big data architecture solution. It will

helps the executive business people in improving the knowledge on the big data technologies and

applying them successfully in the business firms.

2. Use Cases of Big Data in Retail Industry

Big data has mainly five major use cases in retail industry. Those include performing customer

behaviour analytics, applying predictive analytics in increasing conversion rate, customer

journey analytics, and personalization of in-store experience to customers, and supply chain and

operational analytics.
Performing customer behaviour analytics: The structure and unstructured big data from the

marketing campaigns, conversion rates, estimating and avoiding the customer churn rate, and

interaction on social media. All of this data is analysed by aggregating together to generate the

insights to identify the motives for customers to purchase products from retail stores, finding

high value customers, understanding consumer behaviour, and best time to reach the target

customers [ CITATION BMA18 \l 1033 ] . These further help acquisition of customers and increasing

their loyalty.

Applying predictive analytics in increasing conversion rate: Predictive analytics is the

effective big data technology useful in collecting every information related to customers

including their interests and interaction over the social media. By utilizing the predictive

analytics, it is easy to correlate the customer profiles and purchasing history and behaviour over

the social media sites. The insights generated through this approach help in promoting and

advertising the retail business and special promotions on Facebook pages and TV shows.

Customer journey analytics: The engineering technologies of big data into integrate the

unstructured and structured data into tools called Hadoop and Apache Hive to analyse the sets of

data irrespective of the data type. The analytical data is using in increasing the customer

retention and driving sales. Big data enabled insights support the marketers in making continuous

communication and understanding the journey of each customer throughout different marketing

channels[ CITATION Abu19 \l 1033 ] . The analytical results will provide the new patterns in

understanding customer behaviour.

Personalization of in-store experience to customers: Big data has crucial role in personalizing

the retail store experience to customers by gathering the data from supply chain systems,

websites, mobile apps, and point of sale and performing analysis on data on data engineering
platforms [CITATION Ham16 \l 1033 ]. The omni-channel retail businesses can analysing the

customer in-store behaviour by facilitating the timely offers to increase the online and offline

purchases by the customers.

Supply chain and operational analytics: The big data engineering to enhance the operational

efficiency by information on the patterns, trends, and outliers to save lot of money in enhancing

decisions. Big data analytics support retailers in analysing the product distribution and supply

chain to minimize the costs and increasing the quality of services. The data like ERP, CRM,

public data, and geo location to analyse the causes for a problem by visualizing data.

3. Critical Analysis of Big Data Technologies

Big data is associated with set of technologies to increase its application into retail sector. Those

include cloud computing, NoSQL database, column oriented databases, granular computing, data

virtualization, data mining, machine learning, MapReduce, Hadoop, PIG, Hive, WibiData,

Platfora, and analytical tools.

Cloud computing: cloud computing enabled big data is the latest technology used in retail

industry in data monetization to learn from past experience. Cloud computing is using in

resolving various challenges of big data including resource provision, data locality, business

scalability, data streaming, and improving quality of service[ CITATION Yan16 \l 1033 ].

NoSQL database: This technology includes the document stores, key value stores, and graph

databases. These support in retrieval and storage of high volume of semi-structured,

unstructured, and structured data. By avoiding the use of the traditional databases with NoSQL

database supported in increasing the consistency and scalability of data.


Column oriented databases: column oriented database supports the increased performance of

the queries and reducing the unstructured data generated. This database helps in permitting for

data compression and execution of queries in speed manner to update the data[ CITATION Moo14 \l

1033 ].

Granular computing: Granular computing is popular technology used in the domain of big data

to perform pattern recognition, analysis of intelligently, and machine learning and developing the

decision-making models. The granules including clusters, classes, subsets, intervals, and groups

to create effective computational model to different applications including document analysis,

data mining, biometrics, and financial gaming.

Data virtualization: Data virtualization is the technology helps delivering the data variety of

sources including distributed data stores and Hadoop stores in real time[ CITATION Gil16 \l 1033 ].

Data mining: Performing mining on different data sets becoming a technology into big data. It is

needed to gain the accurate results, identifying the developments in retail sector, and determining

the future trends of technology. It discovers the pattern hidden in large volume of streams and

data

Machine learning: Big data based machine learning is the technology used for knowledge

discovery and developing intelligent decisions by applying three types of learning including

ensemble and incremental learning, data stream learning, and deep learning. Machine learning is

divided into three types including reinforcement learning, unsupervised learning, and supervised

learning.
MapReduce: It is a programming technology useful in execution of massive jobs. It comprises

two tasks including Map and Reduce. Map task converts the dataset into value pairs and Reduce

task combines the outputs of Map task into form[ CITATION Abu19 \l 1033 ].

Hadoop: Hadoop is the open source platform used for managing the big data. It is easy to work

with variety of data sources to ensure support for large scale processing of data to perform the

machine learning tasks. The major use case of Hadoop is that collection and analysis of location

based data in retailing industry from social media

PIG: PIG technology is useful in associating the business users and developers with Hadoop

technology. It uses a language called ‘perl-like’ language to execute the queries on data stored in

Hadoop cluster.

Hive: Hive technology is similar to SQL that permits the use of the business intelligence

applications. It is one of the data warehouse application that supports in reading, managing, and

writing the data in SQL language[ CITATION Rid15 \l 1033 ].

WibiData: This technology is developed by combining the Hbase and Hadoop technology. It

permits the websites of the retail organizations in exploring the data related to customers,

providing response to them based on their behaviour, and providing recommendations about

product selection.

Platfora: PLATFORA platform supports in creating queries to automate the Hadoop jobs and

organizing and simplifying the sets of data stored in Hadoop

Analytical tools: The analytical tools used in big data are four types including predictive

analytics, prescriptive analytics, diagnostic analytics, and descriptive analytics. Predictive

analytics in big data helps retail firms in predicting the performance based on the past purchasing
history of customers[ CITATION Lep20 \l 1033 ]. Descriptive analytics provides information about

the current business status of firms based on available data. Diagnostic analytics supportive in

finding the root causes of a problems of retailers in attracting the customers. The prescriptive

analytics improve the big data service levels and minimizing the expenses of the operations.

4. Use of Big Data Tools on the Dataset

Data set is referred as the collection of information produced in a tabular format consisting of

several records. The data set primarily includes critical data in more than one database table. In

the retail industry, data sets are determined on particular aspect like grocery sales by store type,

retail sales by category, and apparel sales by store type. These are used in forming large volume

of data to generate insights and add value to business. The critical elements of the data are

considered in retail industry to perform analysis on the data to gain the value by generating

insights into business processes. The data set includes both numbers, values, and text information

related to a specific subject. The selected dataset is of different types of structured, unstructured,

and Meta data. The data downloaded for the analysis from data.world website. This is related to

retail sales of shoes and apparel

Type of Metadata property Name


Author Gary Hoover
Title Apparel Sales by Store Type
Date/Year created December 11, 2016
File size 23.53 KB
Date/Year modified 2016
Keywords Business, retail, economics, retailing
Publisher Data World
As shown in the table, the data of apparel sales by store type are considered. The metadata

properties considered include authors, title, date and year created, size of file, date or year of

modified, publisher, and keywords.


In big data, several tools are developed to perform analysis on different data sets. RapidMiner

and Tableau are two tools used in big data to conduct the critical analysis on the data.

RapidMiner: RapidMiner tool is considered as the data science software platform developed for

providing the integrated environment for preparation of data, deep learning, machine learning,

predictive analytics, and text mining. This tool is written in Java programming and it becoming

the fast tool in gaining acceptance from the big data analytics. It supports for data mining

procedures including the processing, visualization of the data, processing of data, predictive

analytics, and statistical data modelling. RapidMiner is used in business and commercial

applications, business analytics, and development of the applications[ CITATION Rap20 \l 1033 ]. It

is developed for analytics team to unify the process data science lifecycle from preparation of

data to predictive model operations. It offers several benefits including easier use of machine

learning, improving connectivity to data of enterprises, guaranteeing business alignment, gaining

the competitive advantage, and scaling the prescriptive analytics.

Tableau Software: Tableau public the open source software permits the businesses and

individual people by associating with a file or spreadsheet to development of the interactive

visualization of data for websites. It is the intuitive and simple tool to generation of insights

applying the data visualization technique[ CITATION Dat193 \l 1033 ]. Tableau is available in low

cost in the market of data analytics than other players. It is possible to assess hypothesis,

checking of insights, and exploring data with the help of Tableau visuals. It doesn’t required any

knowledge on the programming skills and publishing the data visualizations on a website for

free. Availability of the shared content resulted in making this tool as best tools for analytics on

big data.
5. Critical Analysis on the Output

The outputs of the data analysis performed on the data sets considered on sales generated by

different retail giants are observed as mentioned below.

RapidMiner output results:

The dataset is uploaded on the website of RapidMiner to get the data performance results. The

steps followed including adding data, selecting the required columns, selection of inputs,

selection of models, and inspection of the results. In performing the analysis, the data with 60%

accuracy and 40% classification error is considered with generalized linear model.

6. Big Data Architecture Solution

Architecture of big data is useful in facilitating the tools for performing the data analysis. The

tools are utilized in various applications in businesses. The Hadoop based big data architecture is

shown as below mentioned figure.

Figure 1: Big data architecture[ CITATION Lak16 \l 1033 ]


The big data architecture has mainly three parts including data sources, data system, and

applications and data tools and operational tools.

Data sources: The data sources are of two types of new sources and traditional sources. The new

data sources include email, web logs, social media, and sensors and the traditional data sources

include RFBMS, Online Analytical Processing (OLAP), and Online Transaction Processing

system (OLTP). These are helpful for the retail firms in generating the required data to generate

insights for business. The data gathered in transferred to the data system to apply the big data

analytics technologies.

Data system: The data system include enterprise platform for Hadoop, Relational database

management system, electronic data warehouse, and massively parallel processing used for

storage of data. The data systems and Hadoop platform is associated with each other for

exchanging information and storing and retrieving the information. The data processed is

transferred to applications[ CITATION GoM16 \l 1033 ].

Applications: These are the applications used for applying the big data results and insights.

These include business analytics, enterprise applications, and custom applications. The

applications and data system exchange data on regular basis for different processes and decision-

making.

Data and development tools: The data and development tools are useful in building and testing

the data

Operational tools: For managing and monitoring the operational data, these tools are used.

Hadoop utilized for the big data architecture to implement the batch processing system. Its

framework supports in running various applications with help of terabytes and nodes to reduce
the chances to failure of the system. It has several components that help in gaining several

benefits including increased scalability, block size, high throughput, and reduced chances to

hardware and software failure

7. Conclusion

Big data usage is increasing day by day in the retail industry in various areas. The present report

is to development of the big data strategy document by producing various key points such as use

cases of big data, critical analysis of big data technologies, use of big data tools on the data sets,

critical analysis on the output, and big data architecture solution. Use cases of the big data in

retailer sector are described as the performing customer behaviour analytics, applying predictive

analytics in increasing conversion rate, customer journey analytics, and personalization of in-

store experience to customers, and supply chain and operational analytics. Big data technologies

identified from the research include cloud computing, NoSQL database, column oriented

databases, granular computing, data virtualization, data mining, machine learning, MapReduce,

Hadoop, PIG, Hive, WibiData, Platfora, and analytical tools. Critical analysis is performed on

these technologies to enhance awareness in data collection, processing, storing, and applying.

The big data analytics selected for analysing the datasets include Tableau public and

RapidMiner. The output obtained by applying big data tools on dataset are clearly illustrated.

The benefits of these tools are also described along with how they are useful in generating the

insights. The big data architecture and its components are identified as data sources, data system,

and applications and data tools and operational tools.

You might also like