Professional Documents
Culture Documents
Analytics
Slide
S. No. Reference No. Particulars
From-To
1 Learning Objectives 8
8 Let’s Sum Up 68 – 72
• Explain the meaning of data, information, knowledge and wisdom
• Explain how are data, information and knowledge linked with each other
• Explain the different types of data
• Explain the ways to manage data
• Explain data management using big data
• Explain analytics in business and career in analytics
What is Data, Information, Knowledge, and Wisdom?
• Data refers to statistics, individual facts, or any specific item of information, that
can be numeric or could be collected through observations.
• From a technical point of view, data refers to a set of values that are of qualitative
or quantitative variables. This can be about one or more persons or objects. A
datum refers to a single value of a single variable.
• In a general sense, Information simply refers to data that is processed, organized,
and structured. On the basis of the information, data gets a context that helps in
decision making.
• Familiarity, awareness, or understanding of something or someone, such as facts,
skills, or objects is known as Knowledge. Knowledge can be gained in numerous
ways and from various sources. The sources may include elements like reason,
perception, testimony, memory, scientific inquiry, education, and practice.
What is Data, Information, Knowledge, and Wisdom?
Structured Data
• A set of data that complies with a data model that is pre-defined in nature and is
simple and straightforward to analyse is known as structured data.
• Structured data will be in a tabular format and there will be a defined
relationship between different rows and columns.
• Excel files or SQL databases are some of the common examples of structured data.
Unstructured Data
• Unstructured data is a set of data that might or might not have any logical or
repeating patterns. Unstructured data:
Consists typically of metadata, i.e., the additional information related to data
Comprises inconsistent data, such as data obtained from files, social media
websites, satellites, etc.
Types of Data
Semi-Structured Data
• Semi-structured data, also known as schema-less or self-describing structure,
refers to a form of structured data that contains tags or markup elements in order
to separate semantic elements and generate hierarchies of records and fields in
the given data.
• Such type of data does not follow proper structure of data models as in relation
databases.
Types of Data
Quantitative Data
• The data that expresses a certain quantity, amount, or range is known as
Quantitative data. In this kind of data type, there is usually measurement units
that are associated with, e.g. meters, in the case of the height of a person.
Types of Data
Qualitative Data
• Qualitative data refers to data that involves descriptive and conceptual findings
that may be collected through questionnaires, interviews, or observation.
• By carefully analysing qualitative data, one will be able to explore ideas and will
be able to further explain quantitative results.
How to Manage Data?
Database
• Database refers to an organized collection of data that is stored and accessed
electronically from a computer system. Some of the more complicated databases
are developed by making use of formal design and modeling techniques.
Variety: Beyond the massive volumes and data velocities lies another challenge,
i.e., operating on the vast variety of data. Seen as a whole, these datasets are
incomprehensible without any finite or defined structure.
Variability: A single word can have multiple meanings. Newer trends are
created and older ones are discarded over time – the same goes for meanings as
well. Big Data’s limitless variability poses a unique decipher challenge if its full
potential is to be realised.
Veracity: What Big Data tells you and what the data tells you are two different
situations. If the data being analysed is incomplete or inaccurate, the Big Data
solution will be erroneous. This situation occurs when data streams have a
variety of formats. The veracity of the overall analysis and effort is useless
without cleaning up the data it begins with.
Data Management using Big Data
Visualisation: Another daunting task for a Big Data system is to represent the
immense scale of information it processes into something easily comprehensible
and actionable. For human purposes, the best methods are conversion into
graphical formats like charts, graphs, diagrams, etc.
Value: Big Data offers an excellent value to those who can actually play and
tame it on its scale and unlock the true knowledge. It also offers newer and
effective methods putting new products to their true value even in formerly
unknown market and demands.
Data Management using Big Data
• The advent of IT, the Internet, and globalization have facilitated increased
volumes of data and information generation at an exponential rate, which has led
to “information explosion.”
• This, in turn, fueled the evolution of Big Data that started in 1940s and continues
till date.
• This table is only a synopsis of the evolution. The idea of Big Data began when a
librarian speculated the need for more storage shelves for books as explained in
Table 1, and with time, Big Data has grown into a cultural, technological, and
scholarly phenomenon.
• The generation of Big Data, and with it new storage and processing solutions
equipped to handle this information, helped businesses to:
Enhance and streamline existing databases
Add insight to existing opportunities
Data Management using Big Data
Hadoop
Traditional technologies have proved incapable to handle the huge amounts of
data generated in organisations or to fulfil the processing requirements of such
data. Therefore, a need was felt to combine a number of technologies and
products into a system that can overcome the challenges faced by the
traditional processing systems in handling Big Data.
One of the technologies designed to process Big Data (which is a combination of
both structured and unstructured data available in huge volumes) is known as
Hadoop.
Hadoop is an open-source platform that provides analytical technologies and
computational power required to work with such large volumes of data.
Hadoop platform provides an improved programming model, which is used to
create and run distributed systems quickly and efficiently.
Data Management using Big Data
The process of accessing external devices used to consume too much time
during which the CPU could not be used for any other operation.
The advantage of using external devices for data storage is that secondary
storage is economical as compared to primary storage.
Hive
Hive is a mechanism through which we can access the data stored in Hadoop
Distributed File System (HDFS).
Hive provides a Structured Query Language (SQL) interface, HiveQL, or the
Hive Query Language. This interface translates the given query into a
MapReduce code.
HiveQL enables users to perform tasks using the MapReduce concept but
without explicitly writing the code in terms of the map and reduce functions.
Data Management using Big Data
The data stored in HDFS can be accessed through HiveQL, which contains the
features of SQL but runs on the MapReduce framework.
It should be noted that Hive is not a complete database and is not meant to be
used in Online Transactional Processing Systems, such as online ticketing,
bank transactions, etc.
Pig
Pig was designed and developed for performing a long series of data operations.
The Pig platform is specially designed for handling many kinds of data, be it
structured, semi-structured, or unstructured.
Pig was developed in 2006 at Yahoo. Its aim, as a research project was to
provide a simple way to use Hadoop and focus on examining large datasets.
Pig became an Apache project in 2007. By 2009, other companies started using
pig, making it a top-level Apache project in 2010.
Data Management using Big Data
Pig can be divided into three categories: ETL (Extract, Transform, and Load),
research, and interactive data processing.
Pig consists of a scripting language, known as Pig Latin, and a Pig Latin
compiler.
The Pig programming language offers the following benefits:
Ease of coding: Using Pig Latin, we can write complex programs. The code is
simple and easy to understand and maintain. It takes complex tasks
involving interrelated data transformations as data flow sequence and
explicitly encodes them.
Optimisation: Pig Latin encodes tasks in such a way that they can be easily
optimised for execution. This allows users to concentrate on the data
processing aspects without bothering about efficiency.
Data Management using Big Data
The breakthrough approach used to build the Tableau Desktop tool takes
pictures of data and converts them into optimized database queries, which help
users in spotting patterns, identifying trends, and deriving logical conclusions
and insights.
While working with the Tableau Desktop, the data analyst need not write any
code; all the insights can, instead, be discovered by just connecting to the data
and following the thoughts that strike the mind naturally.
You can easily connect to your data, which is either in the memory or on the
server.
Tableau Desktop allows you to directly retrieve data from the server or load it
in the Tableau data engine from a disk.
The speed of Tableau Desktop is as fast as the thoughts of human beings, and
everything can be done with drag-and-drop technology.
Data Management using Big Data
Tableau Desktop provides options for data sharing in the form of dashboards,
which can be used to reflect relationships by highlighting and filtering data.
Dashboards can also help you create story line in a guided manner for
explaining the insights obtained from data.
The important features of Tableau software include the following:
Single-click data analytics in visual form
Management of metadata
R Language
R is a cross-platform programming language as well as a software environment
for statistical computing and graphics.
Generally, it is used by statisticians and data miners for developing statistical
software and doing data analysis. It is also believed that R is an
implementation of the S programming language combined with lexical scoping
semantics inspired by Scheme.
R is a GNU project, which is freely available under the GNU General Public
License and its pre-compiled binary versions are provided for various operating
systems. R programs can be compiled and run on a wide variety of UNIX
platforms, Windows and MacOS.
If we want to cover programming languages for business analytics and we are
discussing R, then we must discuss Python too, in a paragraph.
Data Management using Big Data
Python
Python is a high-level, open-source, interpreted language that is ideal for
object-oriented programming. Python has a lot of features for dealing with
arithmetic, statistics and scientific functions.
Python is open-source software, which means anybody can freely download it
from www.python.org and use it to develop programs. Its source code can be
accessed and modified as required in the projects.
Other Tools and Technologies to Handle Big Data
Some other important tools and technologies that are used for handling the big
data are as follows:
MapReduce: Originally developed by Google, the MapReduce website
describe it as "a programming model and software framework for writing
applications that rapidly process vast amounts of data in parallel on large
clusters of compute nodes."
Data Management using Big Data
• The act of gathering, organising, and analysing massive data sets in order to
identify distinct patterns and other important information is known as big data
analytics.
• Big data analytics is a combination of technologies and approaches that
necessitate new forms of integration in order to reveal big hidden values from vast
datasets that are different from the norm, more complicated, and on a massive
scale.
• It mostly focuses on tackling new or existing issues in more efficient and effective
ways.
• There are four types of big data analytics, which are as follows:
Descriptive analysis: It can be defined as condensing the existing data to get a
better understanding of what is going on using business intelligence tools. This
helps to get an idea about what happened in the past and if it was as expected
or not.
Analytics of Big Data
• The right analysis of the available data can improve major business processes in
various ways. For example, in a manufacturing unit, data analytics can improve
the functioning of the following processes:
Procurement—To find out which suppliers are more efficient and cost-effective
in delivering products on time
Product Development—To draw insights on innovative product and service
formats and designs for enhancing the development process and coming up
with demanded products
Manufacturing—To identify machinery and process variations that may be
indicators of quality problems
Distribution—To enhance supply chain activities and standardize optimal
inventory levels vis-à-vis various external factors such as weather, holidays,
economy, etc
Analytics of Big Data
Education: Big Data has transformed the modern day education processes
through innovative approaches, such as e-learning for teachers to analyse the
students’ ability to comprehend and thus impart education effectively in
accordance with each student’s needs.
Travel: The travel industry also uses Big Data to conduct business. It
maintains complete details of all the customer records that are then analysed
to determine certain behavioural patterns in customers.
Government: Big Data has come to play an important role in almost all the
undertaking and processes of government. For instance, Indian government
body, UIDAI was able to successfully implement Aadhar card using big data
technologies that includes millions of citizen registration by performing
trillions of data matches every day.
Analytics of Big Data
Healthcare: In healthcare, the pharmacy and medical device companies use Big
Data to improve their research and development practices, while health
insurance companies use it to determine patient-specific treatment therapy
modes that promise the best results.
Analytics of Big Data
Telecom: The mobile revolution and the Internet usage on mobile phones have
led to a tremendous increase in the amount of data generated in the telecom
sector. Managing this huge pool of data has almost become a challenge for the
telecom industry.
Consumer Goods Industry: Consumer goods companies generate huge volumes
of data in varied formats from different sources, such as transactions, billing
details, feedback forms, etc. This data needs to be organized and analysed in a
systemic manner in order to derive any meaningful information from it.
Business Analytics Models
• BA frequently utilises numerous quantitative tools to convert Data into
meaningful information for making informed business decisions.
• These tools can be further categorised into tools for data mining, operations
research, statistics and simulation.
Analytics of Big Data
• Businesses that have been in market for long should conduct SWOT analysis
periodically to evaluate the impact of the changing situations in the market,
getting around the newer business models and respond actively.
• SWOT is not necessarily a pan-organisation process; rather each of the
organisation’s departments can have their own dedicated SWOT, such as
Marketing SWOT, Operational SWOT, Sales SWOT, etc.
• Consider an example of the implementation of SWOT analysis in the organisation,
Apple Inc. Apple was incorporated in 1995 after a long battle with the existing
stakeholders who had control over the shares and stocks.
• Post return to the computing market, facing a mighty challenger in Microsoft,
Apple did not take them head-on as most would have expected.
• Apple identified opportunities in newer areas of the technology, while the world
was considering computers as the lone IT revolution torch-bearer.
Analytics of Big Data
• The market today needs plenty of talented and qualified people who can use their
expertise to help organizations deal with Big Data.
• Qualified and experienced Big Data professionals must have a blend of technical
expertise, creative and analytical thinking, and communication skills to be able to
effectively collate, clean, analyse, and present information extracted from Big
Data.
• Most jobs in Big Data are from companies that can be categorized into the
following four broad buckets:
Big Data technology drivers, e.g. Google, IBM, Salesforce
Big Data product companies, e.g. Oracle
Big Data services companies, e.g. EMC
Big Data analytics companies, e.g. Splunk
Career in Analytics
Data Scientist
Skills Required
• Big Data professionals can have various educational backgrounds, such as
econometrics, physics, biostatistics, computer science, applied mathematics, or
engineering.
• Data scientists mostly possess a master’s degree or Ph.D. because it is a senior
position and often achieved after considerable experience in dealing with data.
• Developers generally prefer implementing Big Data by using Hadoop and its
components.
Technical Skills
• A Big Data analyst should possess the following technical skills:
Understanding of Hadoop ecosystem components, such as HDFS, MapReduce,
Pig, Hive, etc.
Career in Analytics
• The preferred soft skills requirements for a Big Data professional are:
Strong written and verbal communication skills
Analytical ability
Basic understanding of how a business works
Let’s Sum Up
• Unstructured data is a set of data that might or might not have any logical or
repeating patterns.
• Semi-structured data refers to a form of structured data that contains tags or
markup elements in order to separate semantic elements and generate hierarchies
of records and fields in the given data.
• The data that expresses a certain quantity, amount, or range is known as
Quantitative data.
• Qualitative data refers to data that involves descriptive and conceptual findings
that may be collected through questionnaires, interviews, or observation.
• An information system is based on the discovery of hidden patterns in data, which
is a valuable resource, in order to explore information that is required for
successful decision-making in an organisation.
Let’s Sum Up
1 Learning Objectives 76
5 Let’s Sum Up 95 – 96
• Understand the concept of business intelligence (BI)
• Describe the need for BI
• List the difference between BI and BA
• Explain the obstacles to BI in an organisation
• Discuss the emerging trends in BI
Business Intelligence (BI)
• The director of IT for Rubio’s Restaurants, Paul Nishiyama, says, “BI has been
extremely important to us. Finance is getting a whole series of more robust
reports that it did not have before. Producing those reports without a business-
intelligence system would be a manual process that would drain our small staff”.
Need for BI
• BI is the art of making decisions based on information, knowledge and experience.
With the advancement and involvement of computers in our daily life, various
computer-based techniques have improved the BI processes.
• The BI tools turn ‘data’ into ‘information’ and the ‘information’ further aids in
taking ‘decisions’ on time. This results in data transparency, consistency and
information reliability.
Business Intelligence (BI)
• Another new trend is the skill to combine multiple data projects in one while
making it useful in sales, marketing and customer support.
• One example is the CRM – Customer Relationship Management software, which
sources raw data from every division and department, compiles it for a new
understanding that otherwise would not have been visible from one point alone.
• All these boils down to the interchangeable usage of the term ‘business
intelligence’ and ‘business analytics’ and their importance in managing the
relationship between the business managers and data.
• Owners and managers now, as a result of such accessibility, need to be more
familiar with what data is capable of doing and how they need to actively produce
data to create lucrative future returns.
• The significance of the data has not changed, its availability has.
Obstacles to BI in an Organisation
• The amount of data generated daily from many sources in today's inventive world
is enormous. Data analytics is not just for big companies anymore.
• Businesses of all sizes are stepping up their investigative efforts. This might
entail a lot of data that can aid administrators in making good judgments.
• The greatest companies are hitting these new heights by utilising innovative
business intelligence (BI) services that are on the rise.
• Business intelligence is used by firms to account for this, and the proper use of
business intelligence may help organisations increase profits and revenues.
• However, in BI, the necessity to endure hardship is becoming apparent.
Obstacles to BI in an Organisation
• Because of our hyper-connected move into the mobile age, demand for mobile-
based BI solutions has never been higher. However, with increased demand for
business intelligence, comes increased demand for business intelligence.
• Business executives must be able to access data-driven reports and insights 24
hours a day, seven days a week in today's fast-paced, cutthroat digital market.
• While creating mobile-optimised BI solutions might be difficult, with the proper
interactive business intelligence platform, you can log in and get vital insights
from your mobile devices from anywhere in the globe without losing any critical
features or functionality.
• Your mobile-based business intelligence issues are no longer a concern.
Emerging Trends in BI
• Business Intelligence (BI) uses a set of methods, structures and technology that
convert raw data into valuable information that information is used for making
profitable business operations.
• BI tools analyse data and produce reports, summaries, dashboards, maps, graphs
and charts to provide users with detailed information on the nature of the
business.
• Business Intelligence (BI) is a technology-driven method for analysing data and
presenting useful information to help executives, managers and other end-users
make informed business decisions
• BI utilises computing techniques for the discovery, identification and business
data analysis – like products, sales revenue, earnings and costs.
• BI is the art of making decisions based on information, knowledge and experience.
• In a contemporary environment, organisations collect data on a routine basis to
determine how their customers relate to their standard business processes.
Let’s Sum Up
• The importance of BI continues to grow, there are some major changes regarding
their implementation in a contemporary organisation.
• The process of converting data into information and then applying that knowledge
in taking useful business decision is known as the value chain.
• Knowledge represents the learning that is the internalisation of information, data,
study and experience. It serves as the basis for all skills and abilities.
• The amount of data generated daily from many sources in today's inventive world
is enormous.
• Business intelligence is used by firms to account for this, and the proper use of
business intelligence may help organisations increase profits and revenues.
• Some of the top BI trends are SaaS BI, Cloud BI. Automation real-time analytics,
data integrations, etc.
Introduction to Analytics – Session 3
Chapter 3: Resource Considerations to Support Business
Analytics
Chapter Index
Slide
S. No. Reference No. Particulars
From-To
• It is important to fetch the answer to WHY before starting to identify the roles
and responsibilities that might be required for the new analytics personnel.
• Any key analytical personnel will focus on the following:
To find insights related to customers, products and operations, any key person
will focus on building big data collection and analytics capabilities.
Analysing data sources and suggesting ideas that would assist the organisation
in strategic planning solutions to challenges on a one-time or periodic basis.
Helping the organisation to reach a data-driven decision.
Developing useful insights for customers and developing analytical models.
They might also be involved in creating applications for employees that help in
driving more efficiency or revenue.
Required Competencies for Personnel in Analytics
• One of the barometers of the success of the project is that how the project was
handled and how many loopholes were detected.
• The role of any business analyst is very important, mainly because he is the one
who acts as a bridge between customers, all the team members, and stakeholders.
• Any good analytical personnel will require to have the following competencies:
Communication skills: It is very important to effectively and communicate the
right information in a proper way to all the people in a meeting. One should be
aware that the words spoken should not be misinterpreted.
Domain knowledge: Any business analyst needs to have a good knowledge of
the domain in which he is working.
Required Competencies for Personnel in Analytics
• Develop your modelling skills: As the expression goes, a photo paints a thousand
words. Procedures (such as process modelling) are compelling tools to pass on a lot
of data without depending on the textual part. A visual portrayal enables you to
get an outline of the issue or project so that you can see what functions well and
where the loopholes lie.
Business Analytics Data
• Any approach for analytics must adjust to changes in the way people work inside
their business settings, particularly with the developing size of data volumes.
• Arranging data that is redone in a way that bodes well for every business
customer requires infusing content with context before augmenting the estimation
of relevant filtering and representation.
• Enhancing the enormous amounts of data and making a presentation of
significant learning for every business consumer’s needs shows up with many
difficulties.
• Some of the data analytics challenges are as follows:
Content variety and quality: Information sources are no longer entirely
organised. Business folks depend on a pool of information objects that mix
customarily structured information with various types of artefacts.
Business Analytics Data
Content organisation: Forming the data inputs begins with a set of meaning
and semantics, but business requirements change over time. So, the models
need to be flexible with capacity to provide allowances in relation to taxonomic
models, tag inputs and match them based on incidental content.
Connectivity: Any information source may have different levels of importance
inside a wide range of business settings. For instance, remarks about a bike’s
drivability might be more important coming from a vehicle enthusiast blog
owner, which can be checked through Twitter.
Personalisation challenges: More important than separating through
substantial volumes of data resources taken from a variety of sources is that a
wide range of channels must be set up to recognise different filters of business
value relying on who the customers are.
Business Analytics Data
Computer
Data analysis and Data visualisation
network and
statistical packages tool
equipment
Data Pre- Data
processing tool Virtualisation tool
Technology for Business Analytics
• Businesses utilise analytics to investigate and analyse their data, then turn their
discoveries into insights that assist executives, managers and operational workers
make better, more educated business choices.
• For the efficient use of BA, companies are required to have quantitative methods
and evidence-based data for business modelling and decision-making.
• Businesses employ four forms of analytics : descriptive analytics, which looks at
what has occurred in the past; predictive analytics, which looks at what could
happen in the future; prescriptive analytics, which looks at what should happen in
the future; and diagnostic analytics which looks data or information to figure out
why something happened.
• The role of any business analyst is very important, mainly because he is the one
who acts as a bridge between customers, all the team members and stakeholders.
Let’s Sum Up
• Any good analytical personnel will require to have various skills. Some common
skills are communication skills, critical analysis skills, problem-solving skills,
management and leadership skills, technical awareness and time management
skills.
• Business analytics is a composition of various solutions that are used in building
the model of analysis that will help in simulation and creating scenarios.
• Predictive analytics, data mining, applied analytics and statistics are all part of
the business analytics process, which is offered as an application that can be used
by any business user.
• Technologies for BA include considerations are: Database, Web, Big data,
Computer network and equipment, Data analysis and statistical packages, Data
visualisation tool, Data Pre-processing tool, and Data Virtualisation tool.
Introduction to Analytics – Session 4
Chapter 4: Descriptive Analytics
Chapter Index
Slide
S. No. Reference No. Particulars
From-To
• Thus, the process of sampling aims to obtain enough information to create a legal
interpretation about a population.
• Market researchers, for example, use sampling to gauge consumer perceptions on
new or existing goods and services, auditors use sampling to verify the accuracy of
financial statements; and quality control analysts sample production output to
verify quality levels and identify opportunities for improvement.
• Descriptive statistics describes the data while inferential statistics infers about
the population from the sample.
Central Tendency
• Central tendency is the measurement of a single value that attempts to describe a
set of data by identifying the central position within that set of data.
Measurement of central tendency is also called as measures of central location.
Some common terms used as valid measures of central tendency are as follows:
• Mean: The mathematical average is called the mean (or the arithmetic mean),
which is the sum of the observations divided by the total number of observations.
The mean of a population is shown by the μ and the sample mean is denoted by 𝑋.
ത
Descriptive Statistics
• If the population contains N observations x1, x2,…, xN then, the population mean
is calculated as:
• Median: The measure of location that specifies the middle value when the data is
ordered (arranged from the least to the greatest or the greatest to the least) is the
median. If the number of observations is odd, then the median is the exact middle
of the sorted numbers, i.e., the 4 observations. If the number of observations is
even, say 8, the median is the mean of the two middle numbers, i.e., mean of 4th
and 5th observation.
• Mode: A third method of measuring the location is called mode. It is the
observation/number/series that occurs the maximum number of times. The mode
is valuable for datasets containing smaller number of unique values. You can
easily identify the mode from a frequency distribution by identifying the value
having the largest frequency or from a histogram by identifying the highest bar.
• Midrange: A fourth measure of location that is used occasionally is the midrange.
This is simply the average of the greatest and least values in the data set.
Descriptive Statistics
Variability
• A commonly used measure of dispersion is the variance. Basically, variance is the
squared deviations average of the observations from the mean. The bigger the
variance is, the more is the spread of the observations from the mean. This
indicates more variability in the observations. The formula used for calculating
the variance is different for populations and samples. The formula for the variance
of a population is:
• where xi is the value of the ith item, N is the number of items in the population
and μ is the population mean. The variance of a sample is calculated by using the
formula:
• Where, n is the number of items in the sample and x is the sample mean.
Descriptive Statistics
Standard Deviation
• The square root of the variance is the standard deviation. For a population, the
standard deviation is computed as:
Coefficient of Variation
• The coefficient of variation (CV) provides a relative measure of dispersion in data
relative to the mean and is defined as:
• This statistic is useful when comparing the variability of two or more data sets
when their scales differ. The coefficient of variation offers a relative risk to return
measure. The smaller the coefficient of variation, the smaller the relative risk is
for the return provided. The reciprocal of the coefficient of variation, called return
to risk, is often used because it is easier to interpret.
Descriptive Statistics
• Business analytics provides the insight and value of a business, making it one of
the most important IT functions for any running business.
• Descriptive analytics can be defined as condensing the existing data to get a better
understanding of what is going on using business intelligence tools.
• Descriptive analytics involves “What has occurred in the corporation” and “What
is going on now?”
• Statistics, as defined by David Hand, past president of the Royal Statistical
Society in the UK, is both the science of uncertainty and the technology of
extracting information from data.
• Statistics involves collecting, organising, analysing, interpreting and presenting
data.
• Central tendency is the measurement of a single value that attempts to describe a
set of data by identifying the central position within that set of data.
Let’s Sum Up
• The mathematical average is called the mean (or the arithmetic mean), which is
the sum of the observations divided by the total number of observations.
• The measure of location that specifies the middle value when the data are
arranged from least to greatest is the median.
• Mode is the observation/number/series that occurs the maximum number of times.
The mode is valuable for datasets containing smaller number of unique values.
• Midrange is simply the average of the greatest and least values in the data set.
• A commonly used measure of dispersion is the variance. Basically, variance is the
squared deviations average of the observations from the mean.
• The square root of the variance is the standard deviation.
• The coefficient of variation (CV) provides a relative measure of the dispersion in
data relative to the mean.
• In univariate descriptive statistics, only one variable deals with the information.
Introduction to Analytics – Session 5
Chapter 5: Predictive Analytics
Chapter Index
Slide
S. No. Reference No. Particulars
From-To
Finance: A bank can therefore determine whether or not I’m a good candidate
for a loan based on my previous data, so reducing risk.
Healthcare: According to ArborMetrix, predictive analytics can analyse past
patient data using AI and machine learning. The system can then predict
disease risks for particular patients.
Predictive Analytics
Logical skillset
Logic-driven Models
• Logic driven models are created on the basis of inferences and postulations which
the sample space and existing conditions provide. Creating logical models require
solid understanding of business functional areas, logical skills to evaluate the
propositions better and knowledge of business practices and research.
• To understand better, let us take an example of a customer who visits a restaurant
around six times in a year and spends around ` 5000 per visit. The restaurant gets
around 40% margin on per visit billing amount. The annual gross profit on that
customer turns out to be 5000 × 6 × 0.40 = `12000. 30% of the customers do not
return each year, while 70% do return to provide more business to the restaurant.
Predictive Modelling
Logic-driven Models
• Assuming the average lifetime of a customer (time for which a consumer remains
a customer) W 1/.3 = 3.33 years.
• So, the average gross profit for a typical customer turns out to be 12000 × 3.33 =
`39,960.
• Armed with all the above details, we can logically arrive at a conclusion and can
derive the following model for the above problem statement:
Predictive Modelling
Data-driven Models
• The main aim of data-driven model concept is to find links between the state
system variables (input and output) without clear knowledge of the physical
attributes and behaviour of the system.
• The data driven predictive modelling derives the modelling method based on the
set of existing data and entails a predictive methodology to forecast the future
outcomes. It is data-driven only when there is no clear knowledge of the
relationships among variables/system, though there is lot of data.
• Here, you are simply predicting the outcomes based on the data. The model is not
based on hand-picked variables, but may contain unobserved, hidden combination
of variables.
Data Mining
• Data Mining (DM) is the process of discovering trends and patterns from large sets of
data. Data mining involves mathematical and statistical analyses to obtain patterns
and trends that already exist in the data.
• Usually, these patterns are tough to be deciphered by traditional methods of data
analysis because either the associations are too complex or the data is huge. A data
mining function can be used to identify the details of customers who have not made any
transaction in the last one year.
• Data mining is accomplished by building models. A model runs an algorithm over a set
of data. The data mining models can be useful in specific scenarios, such as the
following:
Forecasting
Determining risk and probability
Providing recommendations
Finding sequences and grouping
Data Mining
• The different layers of the data mining process are explained as follows:
Graphical user interface
Pattern evaluation module
Data mining engine
Database or data warehouse server
Database/data warehouse
Knowledge base
Data Mining
• Data Mining (DM) is the process of discovering trends and patterns from large
sets of data.
• Both the architecture and algorithms play a significant role in the mining process.
• Machine learning and predictive analytics are sometimes used synonymously but
they are related to two distinct disciplines.
• Models in predictive analytics can utilise single or more classifiers.
Introduction to Analytics – Session 6
Chapter 6: Prescriptive and Diagnostic Analytics
Chapter Index
Slide
S. No. Reference No. Particulars
From-To
Business Development
• Understanding what new items are required, what differentiating components
will make one item sell better than the other, or which markets are demanding
which items are key zones for prescriptive analytics including:
Identifying and settling on choices about circumstances/rising ranges of unmet
need
Predicting the potential advantage
Proactively following industry trends and actualising techniques to get an
advantage
Exploiting data analytics to distinguish particular buyer populations and
regions that ought to be focused on
Overview of Prescriptive Analytics
Consumer Excellence
• Prescriptive analytics can be utilised to improve purchaser excellence in a huge
number of ways including:
Predicting what purchasers will need and settling on key choices that address
those necessities.
Segmenting purchasers and recognising and focusing on custom fitted
messages to them.
Staying on top of competition and deciding (e.g., marketing, branding) about
items that will prompt more desirable items and higher sales.
Overview of Prescriptive Analytics
Corporate Accounts
• Corporate account functions can immensely use prescriptive analytics to improve
their capacity to settle on choices that help drive internal excellence and outer
strategy.
• The following points describe the internal excellence as follows:
Viability and direction for non-item related activities; what choices ought to be
made and what is the effect.
Viability and direction for item related activities; what choices ought to be
made and what is the effect.
Overview of Prescriptive Analytics
Supply Chain
• Prescriptive analytics can likewise furnish, supply chain capacities with an upper
hand through the capacity to predict and make decisions in a few basic areas
including:
Forecasting future demand and pricing (e.g., supplies, material, fuel and
different components affecting cost to guarantee proper supply).
Utilising prescriptive analytics to illuminate stock levels, schedule plants, route
trucks and different components in the supply chain cycle.
Modifying supplier threat by mining unstructured information regarding value-
based information.
Better understanding historical demand examples and product course through
supply chain channels, anticipating future examples and settling on choices on
future state procedures.
Overview of Prescriptive Analytics
• Diagnostic analytics is used to find the root cause of a given situation. It can also
be used to find the casual relationship between two or more data sets if the root
cause is not detectable.
• The analytics team or person must be careful about selecting relevant data for
analysis or for finding relation among more than one data set.
• Example: You have done descriptive analytics and it shows low sales on your
online grocery store website. Followed by some event checks and analysis, it
occurs to you that users are adding items in the card but are not checking out. You
now come to a conclusion that there is some issue with user experience on your
website, but what is it precisely? There exists many factors which could be
affecting sales, such as the payment page is not using more secure https web
service, the payment options form does not work or an unexpected charged
amount appears on the page. Hence, diagnostic analysis enables you to present a
picture and the cause behind it, which is not apparent in the presented data.
Diagnostic Analytics
• Rapidminer is a tool that provides user an open source analytics platform that
delivers AI and prescriptive analytics to businesses.
• Alteryx is a self-service platform that delivers several kinds of products for the
business needs.
• Prescriptive analytics can automatically and continuously process new data to
improve forecast accuracy and offer better decision options.
• Supply chain capacities with an upper hand through the capacity to predict and
make decisions in a few basic areas.
Introduction to Analytics – Session 7
Chapter 7: Data Representation and Visualisation for
Analytics
Chapter Index
Slide
S. No. Reference No. Particulars
From-To
• The data is first analysed and then the result of that analysis is visualised in
different ways as discussed above. There are two ways to visualise a data:
infographics and data visualisation.
Infographics
• Infographics are the visual representations of information or data rapidly and
accurately. The use of colourful graphics in drawing charts and graphs helps in
improving the interpretation of a given data.
Data Visualisation
• Data visualisation is the study of representing data or information in a visual
form. With the advancement of digital technologies, the scope of multimedia has
increased manifold.
• It is an established fact that the human mind can comprehend information more
easily if it is presented in the form of visuals.
Ways of Representing Visual Data
• News channels often integrate and present visuals related to accidents, natural
disasters, weather reports, and survey results to incite a realistic imagination in
the viewers’ mind.
• Visualisation is an excellent medium to analyse, comprehend, and share
information.
• Visual images help to transmit a huge amount of information to the human brain
at a glance.
• Visual interpretations help in exploring data from different angles, which help
gain insights.
• Visualisation helps in identifying problems and understanding trends and
outliers.
• Visualisations point out the key or interesting breakthroughs in a large dataset.
Ways of Representing Visual Data
• Data can be classified on the basis of the following three criteria irrespective of
whether it is presented as data visualisation or infographics:
Method of creation
Quantity of data displayed
Degree of creativity applied
• Various content types:
Graph
Diagram
Timeline
Template
Checklist
Flowchart
Mind map
Techniques Used for Visual Data Representation
• Data can be presented in various visual forms, which include simple line
diagrams, bar graphs, tables, matrices, etc. Some techniques used for a visual
presentation of data are as follows:
Isoline: It is a 2D data representation of a curved line that moves constantly on
the surface of a graph.
Techniques Used for Visual Data Representation
• Streamline: It is a field line that results from the velocity vector field description
of the data flow.
• Hyperbolic Trees: They represent graphs that are drawn using the hyperbolic
geometry.
Techniques Used for Visual Data Representation
• You already know that data can be visualised in many ways, such as in the forms
of 1D, 2D, or 3D structures. The following are the different types of data
visualisations:
1D (Linear) data visualisation: In the linear data visualisation, data is
presented in the form of lists. Hence, we cannot term it as visualisation. It is
rather a data organisation technique. For example, a list of items organised in
a predefined manner.
2D (Planar) data visualisation: This technique presents data in the form of
images, diagrams, or charts on a plane surface. Cartogram and dot distribution
map are examples of 2D data visualisation. Some tools used to create 2D data
visualisation patterns are GeoCommons, Google Fusion Tables, Google Maps
API, Polymaps, Tableau Public, etc. For example, choropleth, cartogram, dot
distribution map, and proportional symbol map.
Types of Data Visualisation
• Data visualisation tools and techniques are used in various applications. Some of
the areas in which we apply data visualisation are as follows:
Education: Visualisation is applied to teach a topic that requires simulation or
modelling of any object or process. Organ system or structure of an atom is best
described with the help of diagrams or animations.
Information: Visualisation is applied to transform abstract data into visual
forms for easy interpretation and further exploration.
Production: Various applications are used to create 3D models of products for
better viewing and manipulation. Real estate, communication, and automobile
industry extensively use 3D advertisements to provide a better look and feel to
their products.
Science: Every field of science, including fluid dynamics, astrophysics, or
medicine use visual representation of information.
Applications of Data Visualisation
• Visual analysis of data is not a new thing. For years, statisticians and analysts
have been using visualisation tools and techniques to interpret and present the
outcomes of their analyses.
• Almost every organisation today is struggling to tackle the huge amount of data
pouring in every day. Data visualisation is a great way to reduce the turn-around
time consumed in interpreting Big Data.
• Traditional visualisation techniques are not efficient enough to capture or
interpret the information that Big Data possesses.
• Big Data comprises both structured as well as unstructured form of data collected
from various sources. Heterogeneity of data sources, data streaming, and real-
time data are also difficult to handle using traditional tools.
• The response time of traditional tools is quite high, making it unfit for quality
interaction.
Visualising Big Data
• Considering all these factors, IT companies are focusing more on the research and
development of robust algorithms, software, and tools to analyse the data that is
scattered in the Internet space.
• Tools such as Hadoop are providing state-of-the-art technology to store and
process Big Data.
• Analytical tools are now able to produce interpretations on smartphones and
tablets. It is possible because of the advanced visual analytics that is enabling
business owners and researchers to explore the data to find out trends and
patterns.
Visualising Big Data
• Visualisation of data produces cluttered images that are filtered with the help of
clutter-reduction techniques. Uniform sampling and dimension reduction are two
commonly used clutter-reduction techniques.
• Visual quality metrics can be categorised as:
Size metrics, (e.g. number of data points)
Visual effectiveness metrics, (e.g. data density, collisions)
Feature preservation metrics, (e.g. discovering and preserving data density
differences)
Visualising Big Data
• Data is everywhere, but to represent the data in front of the users in such a way
that it communicates all the necessary information effectively is important.
• Data can be presented in various visual forms, which include simple line
diagrams, bar graphs, tables, matrices, etc.
Let’s Sum Up
• Data visualisation tools and techniques are used in various applications, such as
education, information, production, science, visual analytics, etc.
• Visual analysis of data is not a new thing. For years, statisticians and analysts
have been using Visualisation tools and techniques to interpret and present the
outcomes of their analyses.
• Visual data mining also works on the same principle as simple data mining.
• Visualisation of data produces cluttered images that are filtered with the help of
clutter-reduction techniques.
Introduction to Analytics – Session 8
Chapter 8: Tools for Data Visualisation
Chapter Index
Slide
S. No. Reference No. Particulars
From-To
• Working with various Big Data analytics systems necessitates the use of Big Data
visualisation. Decision-making becomes considerably easier if the flow of raw data
is represented with visuals. To meet and surpass the customer’s expectations, Big
Data visualisation tools should provide the following features:
The capacity to process many sorts of incoming data.
The ability to use numerous filters to fine-tune the results.
The capacity to interact with data sets during analysis.
The capacity to link to other software in order to accept incoming data or offer
input.
The ability to give users collaborative alternatives.
• The most common Big Data visualisation tools to assist you in selecting the best
match for your situation are discussed ahead.
Proprietary Data Visualisation Tools
Tableau
• Tableau is a data visualisation platform that allows data analysts, scientists,
statisticians, and other professionals to analyse data and form clear conclusions
based on their findings. Tableau handles large amounts of data fast and produce
the necessary data visualisation result.
• It is used to create and share interactive dashboards, which can depict the
variation and density of data on various visual forms like charts and graphs.
• Tableau is accessible for individual data analysts as well as corporate teams and
companies on a larger scale. It offers a 14-day free trial before moving on to the
premium version.
• Tableau offers five main products to fit into diverse data visualization
requirements for professionals and organisations. They are:
Proprietary Data Visualisation Tools
QlikView
• Qlik is a software firm that mainly deals in data visualisation, executive
dashboards, and self-service business intelligence.
• Gartner consistently ranks Qlik as one of the top data visualization and business
intelligence (BI) suppliers in the industry, alongside Tableau and Microsoft.
QlikView, the company’s main product, enables visual data exploration, self-
service BI reporting, and the creation and sharing of data dashboards.
• Qlik Sense, the company’s second key service, provides for more free-form
analytics and allows customers to construct data and web applications via API
connections. Qlik Sense may be installed either on-premises or in the cloud. The
firm also runs a product called Data Market, which gives QlikView customers
access to a selected list of publicly available data sets such as census data,
financial data, and business filing data.
Proprietary Data Visualisation Tools
R
• R analytics is data analytics performed using the R programming language, which
is an open-source language used for statistical computation or graphics.
• Statistical analysis and data mining applications typically employ this
programming language. It may be used in analytics to find trends and create
useful models.
• R, which has a graphical user interface for programmed development, provides a
wide range of analytical modelling approaches, including traditional statistical
tests, clustering, time-series analysis, linear and nonlinear modelling, and others.
• The interface consists of four windows: the script window, the console window, the
workspace window, the history window, and tabs of interest (help, packages, plots,
and files).
Proprietary Data Visualisation Tools
R
• R enables the creation of publication-ready plots and visualisations, as well as the
storing of reusable analytics for future data.
• It also now supports extensions and additional plugins such as R Studio and R
Excel, making learning easier and faster for novice business analysts and other
users. It has become the industry standard for statistical analysis and data
mining projects, and its use is expected to increase as more R-trained analysts
enter the profession.
Proprietary Data Visualisation Tools
Python
• Python is a prominent multi-purpose programming language that is extensively
used for its versatility as well as its large library collection, which is useful for
analytics and sophisticated computations.
• Because of Python’s versatility, it offers dozens of libraries dedicated to analytics,
including the widely used Python Data Analysis Library (also known as Pandas).
• The majority of Python data analytics libraries are developed in some way from
the NumPy library, which contains hundreds of mathematical computations,
operations, and functions.
• Python’s performance capabilities are far higher than that of other prominent
data analytics languages, and its interoperability with a wider range of other
languages means that it is just more convenient in most circumstances.
Proprietary Data Visualisation Tools
Excel
• Excel is a Microsoft software tool that utilises spreadsheets to organize numbers
and data using formulae and functions. Excel analysis is used by organisations of
all kinds to undertake financial analysis all across the world.
• Excel is commonly used for data organisation and financial analysis. It is utilised
in all business functions and at all sizes of businesses.
• Its primary role is to generate data reports. SAP Analytics Cloud is a cloud-based
solution that combines business intelligence, enhanced and predictive analytics,
and planning. It offers sophisticated analytics across the company as the analytics
layer of SAP’s Business Technology Platform.
Microsoft Power BI
different types of charts from data in spreadsheets. Excel has restrictions in terms
of what you can generate. If your company needs a more sophisticated data
visualisation tool but wants to stay within the Microsoft environment, Power BI is
a great option.
• Power BI was designed primarily as a data analytics and visualisation tool, and it
can import data from a variety of sources and create representations in a variety
of formats.
Proprietary Data Visualisation Tools
Sisense
• Sisense Fusion is an AI-powered software solution that consistently injects
intelligence in the appropriate place at the correct time.
• Sisense is a business intelligence-based data visualisation solution that offers a
variety of tools to help data analysts simplify difficult data and get insights for
their companies and outsiders.
• Sisense thinks that, in the end, every firm will be data-driven, and every product
will be in some way tied to data. As a result, it makes every effort to deliver
various data analytics tools to business teams and data analytics so that they may
assist in transforming their organisations into data-driven enterprises of the
future.
Proprietary Data Visualisation Tools
Sisense
• Sisense Fusion Embed enables you to safely integrate analytics into any
ecosystem. Regardless of your technology stack, integrate analytics directly into
your products. Deploy on any cloud or on-premises, with a single tenant or multi-
tenant architecture, while maintaining complete control with end-to-end
governance and security that is completely automatable through rich APIs.
• For each unique dashboard, business analysts must configure and maintain
reports for each end-user (email content, subscription, and scheduling frequency).
Business users rely on business analysts to make any necessary adjustments to
report parameters, such as unsubscribing from a report.
• Sisense End-User Report Management, a BI reporting solution built inside the
Sisense data and analytics platform, gives business analysts management over
the analytics reports that are developed and disseminated.
Proprietary Data Visualisation Tools
Candela
• Candela is an open-source web visualisation component suite for Kitware’s
Resonant platform. Candela focuses on providing scalable, sophisticated
visualisations via a standardised API for usage in realworld data science
applications. Among the integrated components are:
LineUp dynamic ranking developed by the Harvard University. Visual
Computing Group and the Caleydo project.
Visualisation of the UpSet set by the Harvard University Visual Computing
Group and the Caleydo project.
The Georgia Institute of Technology Information Interfaces Group created the
OnSet set visualisation.
University of Washington Interactive Data Lab Vega visualisations ScatterPlot
is an example component.
Open-Source Data Visualisation Tools
Candela
Kitware’s Resonant platform’s GeoJS geographic visualisations. GeoDots are an
example component.
• Candela is unquestionably one of the greatest solutions for data visualization in
terms of both open source and JavaScript. The package includes a normalised API
for usage in real-world data science applications and is available via the Resonant
platform.
• When it comes to installation, Candela may be installed via regular package
repositories or from the source. Furthermore, while the initial installation
procedure is straightforward, there aren’t many public release versions available.
However, if installed from the source, which is slightly more difficult, the user can
run cutting-edge development versions.
Open-Source Data Visualisation Tools
Charted
• Charted is a free and open-source data visualisation tool licensed under the MIT
license. It was created by the blogging site Medium.com at first.
• All you have to do is give a link to a data file, and the system will extract a
complete, well-choreographed, and easily accessible collection of data from it.
• The platform was designed with usability in mind. The majority of the functions
are already automated.
• It displays its results equally effectively on different screen sizes.
• Its charts are updated on a regular basis (at 30-minute intervals).
• It excels in data sorting by separating data series and graphics.
• With Charted, you can also sort accessible data by type, backdrop, titles or labels,
and other criteria.
Open-Source Data Visualisation Tools
Chart.JS
• Chart.js is an open-source library (available on GitHub) that allows you to quickly
visualise data using JavaScript. It is comparable to Chartist and Google Charts.
• It offers eight distinct chart kinds (including bars, lines, and pies), all of which are
responsive. In other words, you create your chart once, and Chart.js does the
heavy labour for you, ensuring that it is always legible (for example by removing
some uncritical details if the chart gets smaller).
• Here’s all you need to do to create a chart with Chart.js:
1. Determine where you want the graph to appear on your page.
2. Determine the type of graph you wish to create.
3. Provide data, labels, and other settings to Chart.js.
Open-Source Data Visualisation Tools
D3.JS
• D3.js may be used to visualise Big Data in virtually any way. D3.js is an
abbreviation for Data-Driven Document, a JS library enabling interactive Big
Data presentation in basically ANY real-time fashion.
• Given that this is not a tool, a user should have a basic grasp of JavaScript in
order to interact with the data and display it in a human-readable format.
• To elaborate, because this library presents the data in SVG and HTML5 formats,
earlier browsers such as IE7 and 8 are unable to take advantage of D3.js features.
Data from various sources, such as largescale data sets, is coupled in real-time
with DOM to make interactive animations (2D and 3D alike) in an exceptionally
fast manner.
• The D3 design enables users to heavily reuse code across a wide range of addons
and plug-ins.
Open-Source Data Visualisation Tools
Data Wrapper
• Data wrapper, like Google Charts, is a tool for creating charts, maps, and other
visualisations for use online. The tool’s original target audience was reporters
working on news items, but it may be useful to any professional in charge of
administering a website.
• While Data wrapper is simple to use, it is fairly limiting when compared to the
other tools on our list. One of the main drawbacks is that it does not interface with
data sources. Instead, you must manually copy and paste data into the tool, which
may be time-consuming and error-prone if not done correctly.
• Depending on how you wish to use the tool, you can choose between free and
premium options.
Open-Source Data Visualisation Tools
Dygraphs
• Dygraphs is an open-source JavaScript graphing toolkit that is quick and
configurable. It enables people to study and comprehend large amounts of data.
• The features of the Dygraphs are as follows:
Handles massive data sets. Dygraphs can plot millions of points without
becoming clogged.
Interactive right out of the box: zooming, panning, and mouseover are all
enabled by default.
There is a lot of support for error bars and confidence intervals.
You can make dygraphs perform practically anything by utilising parameters
and custom call-backs.
Dygraphs is compatible with all modern browsers. On mobile/tablet devices,
you can even pinch to zoom.
There is a vibrant community of people working on and supporting dygraphs.
Open-Source Data Visualisation Tools
Leaflet
• The leaflet is a free and open JavaScript library for creating mobile friendly
interactive maps. One of the nicest features of this tool is that it is incredibly
lightweight, with only 38 KB of JS.
• The tool is created in such a manner that it has nearly all of the mapping
functionality that most developers would ever require.
• It works well right out of the box on all major desktop and mobile platforms,
making use of HTML5 and CSS3 on newer browsers while being accessible on
older ones.
• Businesses utilise leaflets to sell their products and services. Customers are
constantly informed about new businesses, special deals, and events through
them.
Open-Source Data Visualisation Tools
RAW Graphs
RAW Graphs, which is built on D3.js, makes data gathering and visualisation a
breeze. Other features and functions of this tool include:
• It includes several methods for automatically sourcing and displaying data. You
may, for example, enter comma-separated-values, tab-separated values, or just
copy and paste from a spreadsheet, and RAW Graphs will turn it into beautiful
output.
• It uses a broad range of graphic models to present data. These include both
traditional (such as pie/bar charts and line graphs) and unorthodox (such as bar
graphs).
• RAW Graphs allows you to gain a better understanding of your data collection by
analysing its patterns and trends. You may accomplish this by using visual
variables to map the various aspects of your data collection.
• You can export and edit your output as vector or raster graphics anywhere.
Advantages and Disadvantages of Good Visualisation
• D3.js may be used to visualise Big Data in virtually any way. D3.js is an
abbreviation for Data-Driven Document, a JS library enabling interactive Big
Data presentation in basically ANY real-time fashion.
• Data wrapper is a tool for creating charts, maps, and other visualisations for use
online.
• Dygraphs is a free and open-source JavaScript-based charting framework that
allows users to explore and comprehend large amounts of data.
• The leaflet is a free and open JavaScript library for creating mobile-friendly
interactive maps.
• RAW Graphs is an open-source data visualisation platform designed to make
visualising complicated data simple for everyone.
Introduction to Analytics – Session 9
Chapter 9: Social Media Analytics and Text Mining
Chapter Index
Slide
S. No. Reference No. Particulars
From-To
• The most common form of information and content exchange on social networking
sites is text.
• Online marketers and business analysts examine and interpret the online content
using social media analytics. This analysis helps them amend and mold their
business objectives as per the customer behaviour. For example, the reviews
posted by customers on websites or social marketing media in the form of text or
rating score enable organisations to understand and analyse customer’s
perspectives and expectations.
• Certain tools and methodologies are required to read, interpret, and analyse the
large number of reviews received on a daily basis. This is accomplished by text
mining.
• Text mining or text analytics comes as a handy tool to quantitatively examine the
text generated by social media and filtered in the form of different clusters,
patterns, and trends.
Text Mining
• Text mining employs the concepts obtained from various fields ranging from
linguistics and statistics to Information and Communication Technologies (ICT).
• Statistical pattern learning is applied to create patterns from the extracted text,
which are further examined to obtain valuable information. The overall process of
text mining comprises retrieval of information, lexical analysis, creation and
recognition of patterns, tagging, extraction of information, application of data
mining techniques, and predictive analytics. This can be summarised as follows:
Text mining = Lexicometry + Data mining
• Following Figure depicts the text mining process:
Text Mining
One-pass clustering
Buckshot clustering
• Sentiment analysis is one of the most important components of text mining. Also
termed as opinion mining, it involves careful analysis of people’s opinions,
sentiments, attitudes, appraisals, and evaluations.
• This is accomplished by examining large amounts of unstructured data obtained
from the Internet on the basis of positive, negative, or neutral view of the end
user.
• Sentiment analysis involves the analysis of following sentences:
Facts: Product A is better than product B.
Opinions: I don’t like A. I think B is better in terms of durability.
• Sentiment analysis applies other domains such as linguistics, digital technologies,
text analysis tools, artificial intelligence, and Natural Language Processing (NLP)
for identification and extraction of useful information. This greatly influences
various domains ranging from politics and science to social science.
Sentiment Analysis
• The process of sentiment analysis begins by tagging words using Parts of Speech
(POS) such as subject, verb phrase, verb, noun phrase, determiner, and
prepositions.
• Defined patterns are filtered to identify their sentiment orientation. For example,
‘beautiful rooms’ has an adjective followed by noun. The adjective ‘beautiful’
indicates a positive perspective about the noun ‘room’.
• At this stage, the emotional factor in the phrase is also examined and analysed.
After that, an average sentiment orientation of all the phrases is computed and
analysed to conclude if a product is recommended by a user.
• However, sentiment analysis employs various online tools to effectively interpret
consumer sentiments. Some of the online tools are listed as follows: Topsy,
BackTweets, Twitterfall, TweetBeep and Reachli.
Online Social Media Analysis
• Social media analysis is now broadly used by marketing agencies, social media
managers, PR and communications experts, political groups, journalists and teams
across the enterprise to effectively observe and indulge users on social channels and
carry out research more efficiently.
• Social Media Analysis can be optimised with:
Campaigns: Social analytics to evaluate effectiveness of your marketing campaigns
and validate investments in social media by gaging social media ROI.
Influencers: Find the best influencers to amplify and augment your brand assets by
classifying those having the maximum reach or greatest impact in your target
market.
Audiences: Audience analytics to classify high-value groups and discover their
interests, demographics and brand affiliations to develop personalised content and
more engaging campaigns.
Online Social Media Analysis
• With the accurate tool, firms can use social media analysis to notice patterns in
customer behaviour and concerns and in online chatter about a specific person,
product, or topic.
• By getting to know influences and swings in online conversation, firms can
identify evolving trends while learning more about how consumers use and speak
about the products and services—or those of their rivals.
• Social Mention, Hootsuite Analytics, Google Analytics, UTM parameters,
Brandwatch, Talkwalker, Facebook Analytics, Twitter Analytics, Instagram
Insights, Snapchat Insights, Pinterest Analytics and LinkedIn PageAnalytics are
some of the very common analytics that are available.
Online Social Media Analysis
2. Type the name of the product about which you need to gather information
in the Search box and press the Search button, as shown:
Online Social Media Analysis
• Text mining represents the set of tools, techniques, and methods applied for
automatically processing natural language textual data provided in huge amounts
in the form of computer files. Sentiment analysis is one of the most important
components of text mining.
• Sentiment analysis applies other domains such as linguistics, digital technologies,
text analysis tools, artificial intelligence, and Natural Language Processing (NLP)
for identification and extraction of useful information.
• Artificial intelligence is a technology and a branch of science that deals with the
study and development of intelligent machines and software.
• Social media analysis is now broadly used by marketing agencies, social media
managers, PR and communications experts, political groups, journalists and
teams across the enterprise to effectively observe and indulge users on social
channels and carry out research more efficiently.
Introduction to Analytics – Session 10
Chapter 10: Mobile Analytics
Chapter Index
Slide
S. No. Reference No. Particulars
From-To
• When you look at the past, you will see that wireless mobile technologies have
shown a steady growth, evolving from 1G to 4G. With every major shift in the
technology, there has been a corresponding improvement in both the speed and
efficiency of mobile devices.
• First generation (1G) mobile devices provided only a “mobile voice”, but in second
generation (2G) devices, larger coverage and improved digital quality were
provided. Third generation (3G) technology focused on multimedia applications
like videoconferencing through mobile phones. 3G opened the gates for the mobile
broadband, which was seen in fourth generation (4G) devices.
Introducing Mobile Analytics
• Mobile analytics has several similarities with Web and social analytics, such as
both can analyse the behaviour of the user with regard to an application and send
this information to the service provider.
• However, there are also several important differences between Web analytics and
mobile analytics:
Analytics segmentation: Mobile analytics works on the basis of location of the
mobile devices. For example, suppose a company is offing cab service in a city
like New York. In this case, the company can use mobile analytics to identify
the target people travelling in New York. Mobile analytics works for location-
based segments while a Web analytics works globally.
Complexity of code: Mobile analytics requires more complex code and
programming languages to implement than Web analytics, which is easier to
code.
Mobile Analytics and Web Analytics
• Three major types of results from mobile analytics are explained below:
Advertising/marketing analytics: Commonly the following marketing analytics
data are collected: Installs, Opens, Clicks, Purchases, Registrations, Content
viewed, Level achieved, Shares, Invites and Custom events.
In-App analytics: Commonly the following in-app analytics data are collected:
Device Profile (Mobile phone, tablet, etc., Manufacturer, Operating system)
The factors which can impact the performance of an app irrespective of how well it
was coded include:
App complexity
Hardware variation
Available operating systems
Carrier/network
Commonly the following performance analytics data are collected:
API latency
Carrier/network latency
Data transactions
Crashes
Exceptions
Errors
Types of Applications for Mobile Analytics
• A website can be opened on both computers and mobile phones, while a mobile site
can be open only on the mobile phones; responsive-designed sites, on the other
hand, can open on any device like a computer, tablet and mobile phone.
• Difference among the website, mobile site and Responsive design site:
Types of Applications for Mobile Analytics
• The fundamental task of the mobile analytics tool is similar to other digital
analytical tools like Web analytics. They capture data, collect it, and help to
generate reports that can be used meaningfully after processing.
• The selection of an analytic tool is not an easy process because these tools are new
and undergo rapid enhancements as compared to traditional web analytics tools.
• Following are some points to be considered while selecting mobile analytics tools:
What is your analytical goal?
Analysis techniques
Way of presentation
• There are two classes of mobile analytics tools:
Internal mobile analytics tools
External mobile analytics tools
Mobile Analytics Tools
• To integrate mobile analytics with business processes, you must perform the
following basic steps:
Select the appropriate mobile device like a smartphone or tablet.
List the objectives of mobile analytics for your business process.
Identify the target audience and create a dataset for it.
Modify the dataset by adding missing values and removing unnecessary data.
Use the dimension-reduction technique and transform data (if possible).
Perform some required data mining techniques, like text mining.
Select data mining algorithms and do some analysis for mobile mining.
Applying data mining algorithm and verify the relationships among the
variables.
Evaluate the result/interpretation.
Performing Mobile Analytics
• Mobile analytics has its own challenges. Some of the main challenges can be listed
as follows:
Unavailability of uniform technology
Random change in subscriber identity
Redirect
Special characters in the URL
Interrupted connections
Limited understanding of the network operators
True real-time analysis
Security issues
Let’s Sum Up
• Mobile Web refers to the use of mobile phones or other devices like tablets to view
online content via a light-weight browser for the mobile.
• The term mobile app is short for the term mobile application software. It is an
application program designed to run on smartphones and other mobile devices.
• The fundamental task of the mobile analytics tool is similar to other digital
analytical tools like Web analytics.
• A location-based tracking tool stores information about the location of mobile
devices (or the location of user).
• User-behaviour tracking tool is a software tool that tracks user behaviour with
any particular mobile application.
• Data for analysis is collected on the mobile devices and send back to the server for
further manipulation. Data collected by a mobile device is ultimately transferred
to its server for analysis.
Introduction to Analytics – Session 11
Chapter 11: Business Analytics in Practice-I
Chapter Index
Slide
S. No. Reference No. Particulars
From-To
• The concept of financial analytics talks about various views on the financial data
of a business assisting with in-depth knowledge and taking strategic actions
towards improving the overall performance of a business.
• Financial analytics is a part of BI & EPM with an impact on every arena of a
business. It plays a vital part in business’ profit calculation.
• Financial analytics help in shaping the future goals of a business and can help in
improving its strategies to make decisions. It helps you to focus on calculating and
operating tangible assets like cash and equipment which provides detailed insight
into the financial status of the business and improves the profitability, cash flow
and value of a business.
• Here are certain critical financial analytics which an organisation needs to
implement irrespective of size:
Predictive sales analytics: An informed forecast of sales that helps to plan and
manage highs and lows
Financial and Fraud Analytics
• Fraud analytics on the other hand is utilisation of big data analysis techniques to
keep away online financial fraud.
• The global lockdown owing to COVID19 in 2020 has convinced even more
customers towards using online banking for at least a certain part of their
financial conduct.
• Online fraud, which is already increasing every year has followed suit too. Account
takeover (ATO) which is a peculiar and popular type of financial fraud has
multiplied 280 percent just between Q2 of 2019 and Q2 of 2020, therefore;
financial institutions must apply detailed measures of fraud management
immediately to protect the accounts of their customers.
• Username and password collection at login are not sufficient anymore to guard
against fraud. With someone accessing or attempting to access an account, other
types of data can get used to realise the legitimacy of a customer which helps to
determine the legitimacy of the requested transaction.
Financial and Fraud Analytics
• Such data consists of: type of device being used, whether that device is previously
registered, whether fingerprint is available to verify identity and whether the
requested transaction fits into their historical patterns.
• Fraud impacts organisations in several ways which might be related to financial,
operational or psychological processes.
• As fraud can be executed by any worker inside an organisation or by an external
source, an organisation needs to have successful fraud management or a fraud
analytics program to defend its reputation against fraud and prevent financial
loss.
• The capacity to break down enormous information volumes empowers
organisations to make exact models for perceiving and forestalling future fraud.
• Advanced analytics can also be connected to all key fraud information to foresee
whether an activity is possibly fraudulent before losses happen.
Financial and Fraud Analytics
• Data management software empowers auditors and fraud analysts to break down
an organisation’s business information to gain knowledge into how well internal
controls are working and distinguish transactions that appear to be fraudulent.
• The companies also use whistleblower hotlines which help individuals for
reporting speculated fake conduct or unsafe conduct and violations of its law and
policy.
• Breaking down business exchanges at the source level provides auditors with
better knowledge and a more entire view with regards to the probability of fraud
happening. Analysis involves the investigation of those activities that are
suspicious and help control weaknesses that could be misused by fraudsters.
HR Analytics
• The field of HR analytics can be further divided into the following segments:
Capability analytics
Competency acquisition analytics
Capacity analytics
Employee churn analytics
Corporate culture analytics
Recruitment channel analytics
Employee performance analytics
Healthcare Analytics
• You need to follow the below three steps to get the benefits from marketing
analytics:
1. Practice a balanced collection of analytics methods: In order to get the best
benefits from marketing analytics, you need an analytical evaluation that is
balanced – that is, one that merges methods for: covering the past, exploring
the present, and predicting influencing what’s to come.
2. Evaluate your analytical capabilities and fill in the gaps: Marketing
organisations have an access to a lot of analytical abilities for supporting
different marketing goals. Estimating your present analytical capabilities is
necessary to attain these goals. It is significant to know about your present
situation along with an analytical spectrum, so that you can determine gaps
and take steps to create a strategy for filling those gaps. A marketing
organisation can plan and allocate budget for adding these analytical
capabilities that can be used to fill that particular gap.
Marketing Analytics
Optimising processes
• Marketing analytics enables better, more successful marketing for your efforts and
investments. It can lead to better management which helps in generating more
revenue and greater profitability.
• The following are attained commonly through marketing analytics:
Marketers frequently use main metrics such as lead source monitoring and
cost-per-lead without having a comprehensive grasp of how marketing actions
affect them.
Marketing Analytics
With automatically connecting and unifying your data, you would be able to
spend more time acting on insights which were collected through hard work
and less time on stressful reporting tasks.
Using reports for marketing ROI to make decisions to boost sales.
The individual contribution of the marketing programs gets visible and it
shows how marketing campaigns are influencing sales at each stage of the
customer journey.
Offers better visibility towards future revenue. It lets forecasting how many
new leads, opportunities, and customers will the marketing gain in future.
Web Analytics
• Web analytics refers to measuring, collecting, analysing and reporting of web data
to understand and optimise the usage of Web. Web analytics is the process of
calculating and analysing data in order to have a better knowledge of user activity
on our websites.
• Web analytics also help companies in measuring the outcomes of traditional print
or broadcast advertising campaigns, in estimating how traffic to a website alters
after launching of a new campaign of advertising, in providing accurate figures of
visitors on a website and page views, and in gauging Web traffic and popularity
patterns which are useful in market research. The four basic steps of Web
analytics are as follows:
1. Collection of information: This stage involves gathering of basic or elementary
data. This data involves counting of things.
2. Processing of data into information: The purpose of this stage is to process the
collected data and derive information from it.
Web Analytics
3. Developing KPI: This stage focuses on using the derived information with
business methodologies, referred to as KPIs.
4. Formulating online strategy: This stage emphasises on setting online goals,
objectives and standards for the organisation or business. It also lays
emphasis on making and saving money and increasing market share.
• There are two categories of Web analytics: off-site Web analytics and on-site Web
analytics.
• Off-site Web analytics allows Web measurement and analysis irrespective of
whether you own or maintain a website. It includes the measurement of a
website’s potential audience, visibility and comments that are going on the
Internet.
• On-site Web analytics is used to measure the behaviour of a visitor who had once
visited the website. The On-site Web analytics is used to measure the effectiveness
and performance of your website in a commercial context.
Web Analytics
• Google Analytics and Adobe Analytics are popular on-site Web analytics services.
• There are mainly two methods of gathering the data technically. The first method
lays emphasis on server log file analysis in which the log files are read and used
by the Web server for recording file requests sent by browsers.
• The second method, known as page tagging, uses JavaScript embedded in the Web
page for tracking it. Both the methods can gather data which can be processed for
generating reports of Web traffic.
• Some important uses of Web analytics for business growth are as follows:
Measure Web traffic
Estimate visitors count
Track bounce rate
Identify exit pages
Identify target market
Sport Analytics
• The information gathered in sports is analysed by coaches, players and other staff
members for decision making both during and prior to sporting events. With rapid
advancement in the technology in the past few years, data collection has become
more precise and relatively easier than earlier.
• The advancement in the collection of data has also contributed in the growth of
sports analytics as it totally relies on the collected pool of data. The growth in
analytics has further led to building of technologies such as fitness trackers, game
simulators, etc.
• Fitness trackers are smart devices that provide data about the fitness of players
on the basis of which coaches can take a decision of including particular players in
the team or not. The game simulators help in practicing the game before the
actual sporting event takes place.
• Sports analytics is the practice of putting numerical and statistical principles in
sports and related activities.
Sport Analytics
• The National Basketball of America (NBA) teams are now using the player
tracking technology which can evaluate the efficiency of a team by analysing the
movement of its players. As per the information provided by the SportVu software
website, the teams in NBA have installed six cameras for tracking the movements
of each player on the court and the basketball at the rate of 25 times per second.
The data collected using cameras provide significant amount of innovative
statistics on the basis of speed, player separation and ball possession.
• Sports analytics has also found its application in the field of sports gambling. The
availability of more accurate information about teams and players on the websites
leads sport gambling to new levels.
• Sports gambling contributed 13% of global gambling industry valued somewhere
between $700–$1000 billion. Some of the popular websites which provide betting
services to users are bet365, bwin, Paddy Power, betfair and Unibet.
Analytics for Government and NGO’s
• Data analytics is also playing its role in the government sector. Not only it is
important for government, it is also equally beneficial for non-governmental
organisations (NGOS).
• Data analytics is used by these organisations to get deeper details of data. These
details are used by the organisations for modernising their services, progress and
determining the solutions faster.
• Lot of data gets generated in the government sector and processing and analysing
this data helps the government in improving its policies and services for citizens.
• Some benefits of data analytics in government sector are as follows:
With the help of data analytics, intelligence organisations can detect crime
prone areas and be prepared to prevent or stop any kind of criminal activity.
The analytics also help in detecting the possibility of the cyber-attacks and
identifying criminals.
Analytics for Government and NGO’s
The analytics also help in detecting the possibility of the cyber-attacks and
identifying criminals.
Government can use analytics to track and monitor health of its citizens. It can
also be used for tracking disease patterns. The government can launch proper
healthcare facilities in advance in the areas prone to diseases. It also helps in
arranging and managing free medicines, vaccinations, etc. in order to save life
of people.
Real-time analysis and sensors help government departments in water
management in the city. Government departments can take proper action to
avoid these issues to ensure supply of clean water in city.
Government organisations also use analytics to detect tax frauds and predict
the revenue. Government can take necessary steps to prevent tax frauds and
increase the revenue.
Analytics for Government and NGO’s
• For example, in India, the government led by Prime Minister Narendra Modi has
been encouraging people to adopt Digital India initiative. This will lead to ease in
collection and quicker availability of data for analytics to detect flaws in money
transactions and prevent people from becoming the victim of fake currency.
• Data analytics also helps NGOs in improving their services to needy or poor
people. Mainly, NGOs help people in several ways such as by providing free
education, books, medicines, clothes, etc. NGOs use data analytics to become more
efficient while raising and allocation of funds, predicting trends and planning
campaigns, identifying prospective donors and encouraging donors who have made
contributions earlier, etc.
• Besides Akashya Patra, several other large NGOs such as Bill and Melinda Gates
Foundation India, Save the Children India and Child Rights and You (CRY) are
also utilising data to raise their efficiency in getting and allocating funds,
predicting trends and planning campaigns.
Analytics for Government and NGO’s
• Governments these days use data analytics in order to stay informed for proactive
measures in various situations like:
Crime reduction
Fighting human trafficking
Improvement of food inspections
Preparation for natural disasters
Reduce homelessness
Prediction of cyber attacks
Prevention against child abuse and fatalities
Prevention of accidents
Let’s Sum Up