Professional Documents
Culture Documents
1. Introduction
Big data is determined as large information sets where complex information is transformed to
gain new insights into data that are applied for business activities to make effective decisions. In
retail industry, big data analytics and tools of business intelligence are together using on selected
data set to generate the useful insights to provide support to the big data strategy. It needs to
identify the big data technologies used for collection of data, storage, processing, transformation,
and analysis, query languages, data modeling, and big data architecture solution to facilitate
required support to the identified use cases in retail industry [ CITATION Eri172 \l 1033 ] . The
executive people working in retail business organizations are facing problems in adopting and
applying the big data tools to support the business even though they have huge experience and
knowledge on the business. In the present report, big data strategy document is developed on the
big data use cases in the retail industry, critical analysis of big data technologies, use of the big
data tools on the data set, critical analysis on the output, and big data architecture solution. It will
helps the executive business people in improving the knowledge on the big data technologies and
Big data has mainly five major use cases in retail industry. Those include performing customer
journey analytics, and personalization of in-store experience to customers, and supply chain and
operational analytics.
Performing customer behaviour analytics: The structure and unstructured big data from the
marketing campaigns, conversion rates, estimating and avoiding the customer churn rate, and
interaction on social media. All of this data is analysed by aggregating together to generate the
insights to identify the motives for customers to purchase products from retail stores, finding
high value customers, understanding consumer behaviour, and best time to reach the target
customers [ CITATION BMA18 \l 1033 ] . These further help acquisition of customers and increasing
their loyalty.
effective big data technology useful in collecting every information related to customers
including their interests and interaction over the social media. By utilizing the predictive
analytics, it is easy to correlate the customer profiles and purchasing history and behaviour over
the social media sites. The insights generated through this approach help in promoting and
advertising the retail business and special promotions on Facebook pages and TV shows.
Customer journey analytics: The engineering technologies of big data into integrate the
unstructured and structured data into tools called Hadoop and Apache Hive to analyse the sets of
data irrespective of the data type. The analytical data is using in increasing the customer
retention and driving sales. Big data enabled insights support the marketers in making continuous
communication and understanding the journey of each customer throughout different marketing
channels[ CITATION Abu19 \l 1033 ] . The analytical results will provide the new patterns in
Personalization of in-store experience to customers: Big data has crucial role in personalizing
the retail store experience to customers by gathering the data from supply chain systems,
websites, mobile apps, and point of sale and performing analysis on data on data engineering
platforms [CITATION Ham16 \l 1033 ]. The omni-channel retail businesses can analysing the
customer in-store behaviour by facilitating the timely offers to increase the online and offline
Supply chain and operational analytics: The big data engineering to enhance the operational
efficiency by information on the patterns, trends, and outliers to save lot of money in enhancing
decisions. Big data analytics support retailers in analysing the product distribution and supply
chain to minimize the costs and increasing the quality of services. The data like ERP, CRM,
public data, and geo location to analyse the causes for a problem by visualizing data.
Big data is associated with set of technologies to increase its application into retail sector. Those
include cloud computing, NoSQL database, column oriented databases, granular computing, data
virtualization, data mining, machine learning, MapReduce, Hadoop, PIG, Hive, WibiData,
Cloud computing: cloud computing enabled big data is the latest technology used in retail
industry in data monetization to learn from past experience. Cloud computing is using in
resolving various challenges of big data including resource provision, data locality, business
scalability, data streaming, and improving quality of service[ CITATION Yan16 \l 1033 ].
NoSQL database: This technology includes the document stores, key value stores, and graph
unstructured, and structured data. By avoiding the use of the traditional databases with NoSQL
the queries and reducing the unstructured data generated. This database helps in permitting for
data compression and execution of queries in speed manner to update the data[ CITATION Moo14 \l
1033 ].
Granular computing: Granular computing is popular technology used in the domain of big data
to perform pattern recognition, analysis of intelligently, and machine learning and developing the
decision-making models. The granules including clusters, classes, subsets, intervals, and groups
Data virtualization: Data virtualization is the technology helps delivering the data variety of
sources including distributed data stores and Hadoop stores in real time[ CITATION Gil16 \l 1033 ].
Data mining: Performing mining on different data sets becoming a technology into big data. It is
needed to gain the accurate results, identifying the developments in retail sector, and determining
the future trends of technology. It discovers the pattern hidden in large volume of streams and
data
Machine learning: Big data based machine learning is the technology used for knowledge
discovery and developing intelligent decisions by applying three types of learning including
ensemble and incremental learning, data stream learning, and deep learning. Machine learning is
divided into three types including reinforcement learning, unsupervised learning, and supervised
learning.
MapReduce: It is a programming technology useful in execution of massive jobs. It comprises
two tasks including Map and Reduce. Map task converts the dataset into value pairs and Reduce
task combines the outputs of Map task into form[ CITATION Abu19 \l 1033 ].
Hadoop: Hadoop is the open source platform used for managing the big data. It is easy to work
with variety of data sources to ensure support for large scale processing of data to perform the
machine learning tasks. The major use case of Hadoop is that collection and analysis of location
PIG: PIG technology is useful in associating the business users and developers with Hadoop
technology. It uses a language called ‘perl-like’ language to execute the queries on data stored in
Hadoop cluster.
Hive: Hive technology is similar to SQL that permits the use of the business intelligence
applications. It is one of the data warehouse application that supports in reading, managing, and
WibiData: This technology is developed by combining the Hbase and Hadoop technology. It
permits the websites of the retail organizations in exploring the data related to customers,
providing response to them based on their behaviour, and providing recommendations about
product selection.
Platfora: PLATFORA platform supports in creating queries to automate the Hadoop jobs and
Analytical tools: The analytical tools used in big data are four types including predictive
analytics in big data helps retail firms in predicting the performance based on the past purchasing
history of customers[ CITATION Lep20 \l 1033 ]. Descriptive analytics provides information about
the current business status of firms based on available data. Diagnostic analytics supportive in
finding the root causes of a problems of retailers in attracting the customers. The prescriptive
analytics improve the big data service levels and minimizing the expenses of the operations.
Data set is referred as the collection of information produced in a tabular format consisting of
several records. The data set primarily includes critical data in more than one database table. In
the retail industry, data sets are determined on particular aspect like grocery sales by store type,
retail sales by category, and apparel sales by store type. These are used in forming large volume
of data to generate insights and add value to business. The critical elements of the data are
considered in retail industry to perform analysis on the data to gain the value by generating
insights into business processes. The data set includes both numbers, values, and text information
related to a specific subject. The selected dataset is of different types of structured, unstructured,
and Meta data. The data downloaded for the analysis from data.world website. This is related to
properties considered include authors, title, date and year created, size of file, date or year of
and Tableau are two tools used in big data to conduct the critical analysis on the data.
RapidMiner: RapidMiner tool is considered as the data science software platform developed for
providing the integrated environment for preparation of data, deep learning, machine learning,
predictive analytics, and text mining. This tool is written in Java programming and it becoming
the fast tool in gaining acceptance from the big data analytics. It supports for data mining
procedures including the processing, visualization of the data, processing of data, predictive
analytics, and statistical data modelling. RapidMiner is used in business and commercial
applications, business analytics, and development of the applications[ CITATION Rap20 \l 1033 ]. It
is developed for analytics team to unify the process data science lifecycle from preparation of
data to predictive model operations. It offers several benefits including easier use of machine
Tableau Software: Tableau public the open source software permits the businesses and
visualization of data for websites. It is the intuitive and simple tool to generation of insights
applying the data visualization technique[ CITATION Dat193 \l 1033 ]. Tableau is available in low
cost in the market of data analytics than other players. It is possible to assess hypothesis,
checking of insights, and exploring data with the help of Tableau visuals. It doesn’t required any
knowledge on the programming skills and publishing the data visualizations on a website for
free. Availability of the shared content resulted in making this tool as best tools for analytics on
big data.
5. Critical Analysis on the Output
The outputs of the data analysis performed on the data sets considered on sales generated by
The dataset is uploaded on the website of RapidMiner to get the data performance results. The
steps followed including adding data, selecting the required columns, selection of inputs,
selection of models, and inspection of the results. In performing the analysis, the data with 60%
accuracy and 40% classification error is considered with generalized linear model.
Architecture of big data is useful in facilitating the tools for performing the data analysis. The
tools are utilized in various applications in businesses. The Hadoop based big data architecture is
Data sources: The data sources are of two types of new sources and traditional sources. The new
data sources include email, web logs, social media, and sensors and the traditional data sources
include RFBMS, Online Analytical Processing (OLAP), and Online Transaction Processing
system (OLTP). These are helpful for the retail firms in generating the required data to generate
insights for business. The data gathered in transferred to the data system to apply the big data
analytics technologies.
Data system: The data system include enterprise platform for Hadoop, Relational database
management system, electronic data warehouse, and massively parallel processing used for
storage of data. The data systems and Hadoop platform is associated with each other for
exchanging information and storing and retrieving the information. The data processed is
Applications: These are the applications used for applying the big data results and insights.
These include business analytics, enterprise applications, and custom applications. The
applications and data system exchange data on regular basis for different processes and decision-
making.
Data and development tools: The data and development tools are useful in building and testing
the data
Operational tools: For managing and monitoring the operational data, these tools are used.
Hadoop utilized for the big data architecture to implement the batch processing system. Its
framework supports in running various applications with help of terabytes and nodes to reduce
the chances to failure of the system. It has several components that help in gaining several
benefits including increased scalability, block size, high throughput, and reduced chances to
7. Conclusion
Big data usage is increasing day by day in the retail industry in various areas. The present report
is to development of the big data strategy document by producing various key points such as use
cases of big data, critical analysis of big data technologies, use of big data tools on the data sets,
critical analysis on the output, and big data architecture solution. Use cases of the big data in
retailer sector are described as the performing customer behaviour analytics, applying predictive
analytics in increasing conversion rate, customer journey analytics, and personalization of in-
store experience to customers, and supply chain and operational analytics. Big data technologies
identified from the research include cloud computing, NoSQL database, column oriented
databases, granular computing, data virtualization, data mining, machine learning, MapReduce,
Hadoop, PIG, Hive, WibiData, Platfora, and analytical tools. Critical analysis is performed on
these technologies to enhance awareness in data collection, processing, storing, and applying.
The big data analytics selected for analysing the datasets include Tableau public and
RapidMiner. The output obtained by applying big data tools on dataset are clearly illustrated.
The benefits of these tools are also described along with how they are useful in generating the
insights. The big data architecture and its components are identified as data sources, data system,