You are on page 1of 96

Business Analytics

Business analytics
• is a set of statistical and operations research techniques, artificial
intelligence, information technology and management strategies
used for framing a business problem, collecting data, and
analyzing the data to create value to organizations
• 3 components of Business Analytics:
– Business Context
– Technology
– Data Science
• Business Context: the ability to ask the right questions
• Technology: automation of actionable items derived from
analytical models; automation of actionable items is usually
achieved using IT
• Data Science: identify the most appropriate statistical
model/machine learning algorithm that can be used
bigbasket.com
• 30% of the items they intend to buy.
• Fernandes et al. (2013) reported that on average, customers
forget 30% of the items they intend to buy
• Forgetfulness can have significant cost impact for the online
grocery stores
– customers may buy the forgotten items from a nearby store
where they live
– customer may place another order for forgotten items
• The ability to predict the items that a customer may have
forgotten to order can have a significant impact on the profits of
online grocers such as bigbasket.com
• did you forget feature is by the Indian online grocery store
bigbasket.com
Akshaya Patra Foundation
• Vasanthapura kitchen in Bangalore, approximately 84000
school children from 650 schools in South Bangalore were
provided mid-day meals
• The Vasanthapura kitchen used 35 vehicles to distribute the
cooked food. To minimize the cost of distribution, they need to
solve a complex vehicle routing problem (VRP).
• To simplify this problem, assume that they divide the number
of schools equally among the vehicles; each vehicle would
then have to deliver food to approximately 20 schools (few
vehicles are kept as standby). For each vehicle, we need to
find the best route.
Changing Business Environments – 1/2
• Traditional applications
– Payroll
– Bookkeeping
• Current applications
– Complex managerial areas, such as design & automated
factories
– Evolution of proposed mergers and acquisitions
– Nearly all executives use IT as it is vital to their business
• Analytics are used to develop reports on
– What is happening?
– Predict what is likely to happen?
– Make decisions to make best use of the situation
Changing Business Environments – 2/2
• Transition from processing and monitoring activities to problem
analysis and solution
• Transition from standalone applications to cloud-based
technologies and mobile devises
• Analytics and BI tools to Modern Management
– Data warehousing
– Data mining
– Online Analytical Processing (OLAP)
– Dashboards
• High speed network systems
– Wirelines/Wireless
• Automation of routine decisions – eliminate need for managerial
interventions
Factors Facilitated Growth BI & Analytics – 1/2
• Increased hardware, software, and network capabilities
• Group communication and collaboration
– Helps Supply chain to react marketplace changes faster
• Improved data management
– Complex computation using multiple databases, multiple media
– Systems can search, store, transmit data quickly from distinct
locations, economically, securely & transparently
• Managing giant data warehouses and Big Data
– Cost of big data storage and mining are declining rapidly
– Special methods for organize, search and mine
▪ Parallel computing, Hadoop/Spark
Hadoop and Spark
• Hadoop and Spark, both developed by the Apache Software Foundation, are
widely used open-source frameworks for big data architectures
• The Hadoop ecosystem
– Enables big data analytics processing tasks to be split into smaller tasks.
– The small tasks are performed in parallel by using an algorithm (e.g., MapReduce),
– Then distributed across a Hadoop cluster (i.e., nodes that perform parallel
computations on big data sets)
– It is a highly scalable, cost-effective solution that stores and processes structured,
semi-structured and unstructured data

• The Spark ecosystem


– Enables users to perform large-scale data transformations and analyses, and then
run state-of-the-art machine learning (ML) and AI algorithms
– Like Hadoop, Spark splits up large tasks across different nodes
– It uses random access memory (RAM) to cache and process data instead of a file
system (perform faster than Hadoop)

• Hadoop is ideal for batch processing and linear data processing. Spark is ideal
for real-time processing and processing live unstructured data streams
Factors Facilitated Growth BI & Analytics – 1/2
• Analytical support
– Perform complex simulations, check many possible
scenarios
– Asses diverse impacts quickly & economically
• Overcoming cognitive limits in processing and storing information
– Quickly accessing and processing vast amounts of stored
information
• Knowledge management
– Text Analytics and IBM Watson derive value from
unstructured communication between stakeholders
• Anywhere, anytime support
– Perhaps the biggest change
Evolution of Computerized Decision
Support to Analytics/Data Science
Decision Support Systems (DSS)
• Interactive computer-based systems, which help decision makes
utilize data and models to solve unstructured problems (Gorry &
Scott-Moeton 1971)
• Decision support systems couple the intellectual resources of
individuals with the capabilities of the computer to improve the
quality of decisions. It is a computer-based support system for
management decision makers who deal with semi-structured
problems (Keen & Scott-Moeton 1978)
Enterprise Resource Planning (ERP)
• Integrated enterprise-level information systems
• Sequential and non-standardised data representation schemas
are replaced with relational database systems (RDBM)
• Improve capture, storage of the data and relationship between
data fields – thus reducing replication significantly
• Improve data integrity and consistently and effectiveness of
business practices
• Data from different functions is connected and integrated into
consistent schema – single version available organization wide
• Decision makers could decide when they needed to or wanted
to create specialized reports to investigate organizational
problems and opportunities.
Executive Information Systems (EIS)
• Need for more versatile reporting led to development of EIS
• Graphical dashboards and scorecards to keep track of KIPs
• Middle data tier (Data Warehouse – DW) is created to maintain
transactional integrity of business information system intact
• Dashboards and scorecards got data from DW. This helped in
keeping the efficiency of ERP systems intact.
• DW driven DSSs began to be called BI systems
Data Warehousing (DW)
• Data in DW is updated periodically, hence does not reflect the
latest information
• Real-time data warehousing (right-time data warehousing)
overcome information latency problem, by need based data
refreshing policy
• DWs are very large and feature rich
– Data mining & Text mining is required to discover new and
useful knowledge to improve business processes &
practices
• More storage & more processing power is needed to handle
increasing volumes and variety of data
– Service oriented architecture
– Software & Infrastructure as-a-service
Big Data
• In 2010s new data generation mediums emerged due to
widespread use of Internet
– Radio-frequency identification (RFID) tags, digital meters,
clickstream weblogs, smart home devices, wearable health
monitoring, social media
• Analysis of such unstructured data rich in information content
poses significant challenges – software and hardware
• Storage: Store data in chunks on different machines connected
by a network, both logically and physically. It is originally used
by Google (Google File System) and later released as Apache
project as Hadoop Distributed File System (HDFD)
• Processing: Push computation to data, known as MapReduce
program and later released as Apache project as Hadoop
MapReduce.
Business Intelligence (BI)
• BI is an umbrella term that combines architectures, tools, data
bases, analytical tools, applications and methodologies
• BIs major objective
– Enable interactive access to data (often in real time)
– Enable manipulation of data
– Ability conduct appropriate analyses
– Enable them to make more informed & better decisions
• BI process – transformation of data to information then to
decisions and finally to actions
• By 2005, BI systems started including artificial intelligence
capabilities as well as powerful analytical capabilities
• Managers need the right information at the right time and in the
right place
Evolution of Business Intelligence (BI)
Four Components of BI Architecture

1 2 3

4
Transaction Processing Vs Analytic
Processing
• Online Transaction Processing (O L T P)
– Constantly involved in handling updates to operational
databases (ERP)
– Handle a company’s routine ongoing business
– Inefficient fro end-user ad hoc reports, queries, analysis
• Online Analytical Processing (O L A P)
– DWs contain data from OLTP in a reorganized and
structured way that is fast, efficient for querying, analysis
and decision support
Appropriate Planning and Alignment with
the Business Strategy

• Planning and Execution → Business, Organization,


Functionality, and Infrastructure
• Functions served by BI Competency Center
– How BI is linked to strategy and execution of strategy
– Encourage interaction between the potential business
user communities and the IS organization
– Serve as a repository and disseminator of best BI
practices between and among the different lines of
business.
– Standards of excellence in BI practices can be
advocated and encouraged throughout the company
Real-Time, On-Demand BI is Attainable

• Emergence of real-time BI applications


• Justifying the need
– Is there a need for real-time [is it worth the additional
expense]?
• Leveraging the enablers
– RFID
– Web services
– Intelligent agents
Critical BI System Considerations

• Developing or Acquiring BI Systems


– Make versus buy
– BI shells
• Justification and Cost–Benefit Analysis
– A challenging endeavor, why?
• Security
• Protection of Privacy
• Integration to Other Systems and Applications
Analytics Overview

• Analytics…a relatively new term/buzz-word


• Analytics…the process of developing actionable
decisions or recommendations for actions based on
insights generated from historical data
• According to the Institute for Operations Research and
Management Science (INFORMS)
– Analytics represents the combination of computer
technology, management science techniques, and
statistics to solve real problems.
Three Types of Analytics
Descriptive Analytics
• Mainly uses simple descriptive statistics, data visualization
techniques
• Descriptive analytics process of using current and historical data
to identify trends and relationships. It’s sometimes called the
simplest form of data analysis because it describes trends and
relationships but doesn’t dig deeper.
• Descriptive analytics uses descriptive statistics and queries to
gain insights from the data
• Dashboards are created using innovative visuals form the core of
business intelligence and are an important element of analytics.
• Tableau and Qlik Sense are popular visualization tools for
creating dashboards to monitor several key performance
indicators relevant for the organization in real time.
Tableau – Sample report
Traffic and Engagement Reports
• If your organization tracks engagement in the form of social
media analytics or web traffic, you’re already using descriptive
analytics.
• These reports are created by taking raw and using it to compare
current metrics to historical metrics and visualize trends.
– Analyse the page’s traffic data to determine the number of
users from each source. Further, compare traffic source data
to historical data from the same sources.
– This can help check on improvement; for instance,
highlighting that traffic from paid advertisements increased 20
percent year over year.
• The three other analytics types can then be used to determine
why traffic from each source increased or decreased over time, if
trends are predicted to continue, and what your team’s best
course of action is moving forward.
Financial Statement Analysis
• Financial statement analysis can be done in three primary
ways: vertical, horizontal, and ratio.
• Vertical analysis helps determine relationships between
variables. which are taking up larger and smaller percentages
of the whole.
• Horizontal analysis helps determines change over time.
• Ratio analysis directly compares items across periods, as well
as your company’s ratios to the industry’s to gauge whether
yours is over – or underperforming.
• Each of these financial statement analysis methods are
examples of descriptive analytics, as they provide information
about trends and relationships between variables based on
current and historical data.
Demand Trends
• Identify trends in customer preference and behaviour and make
assumptions about the demand for specific products or
services.
• Netflix’s gathers data on users’ in-platform behaviour. They
analyse this data to determine which TV series and movies are
trending at any given time and list trending titles in a section of
the platform’s home screen.
• This data allow Netflix users to see what’s popular – and thus,
what they might enjoy watching
• It also helps Netflix team to know which types of media, themes,
and actors are especially favoured at a certain time.
• This can drive decision-making about future original content
creation, contracts with existing production companies,
marketing, and retargeting campaigns.
Aggregated Survey Results
• Insights from survey and focus group data can help identify
relationships between variables and trends.
• For instance, you may conduct a survey and identify that as
respondents’ age increases, so does their likelihood to
purchase your product. If you’ve conducted this survey multiple
times over several years, descriptive analytics can tell you if
this age-purchase correlation has always existed or if it was
something that only occurred this year.
• Insights like this can pave the way for diagnostic analytics to
explain why certain factors are correlated.
• You can then leverage predictive and prescriptive analytics to
plan future product improvements or marketing campaigns
based on those trends.
Progress to Goals
• Reporting on progress toward key performance indicators
(KPIs) can help your team understand if efforts are on track or
if adjustments need to be made.
• For example, if your organization aims to reach 500,000
monthly unique page views, you can use traffic data to
communicate how you’re tracking toward it.
• Perhaps halfway through the month, you’re at 200,000 unique
page views. This would be underperforming because you’d
like to be halfway to your goal at that point—at 250,000 unique
page views.
• This descriptive analysis of your team’s progress can allow
further analysis to examine what can be done differently to
improve traffic numbers and get back on track to hit your KPI.
Diagnostic Analytics
• Diagnostic analytics can be leveraged to understand why
something happened and the relationships between related
factors.
• It can be viewed as a logical next step after using descriptive
analytics to identify trends.
• There several concepts to understand before diving into
diagnostic analytics:
– Hypothesis testing is the statistical process of proving or
disproving an assumption: future or historically oriented
– When exploring relationships between variables determining
causation is ideal, correlation if present can be used it to make
impactful decisions
– When statistically significant relationship is present in
historical data, regression can be used to develop forecasts
for the future
Examining Market Demand
• HelloFresh – meal kit subscription company – gathers millions of
data points from global users, including information about
geographic location, disclosed demographic data, meal type,
flavour preferences, and typical order cadence and timing
• As a hypothetical example, imagine the HelloFresh team
identifies a spike in fish-based recipe orders. After conducting
diagnostic analysis, they find that the attributes most highly
correlated with ordering fish recipes are identifying as female and
living in the north-eastern United States.
• The team could conduct market research with that specific
demographic to learn more about the demand for fish recipes
• the team could also consider whether the trend is expected to
continue (predictive analytics) and if it’s worth the effort and
money to create more fish-based recipes to cater to this
audience’s preference (prescriptive analytics)
Explaining Customer Behaviour
• Diagnostic analytics is the key to understanding why
customers do what they do. These insights can be used to
improve products and user experience (UX), reposition brand
messaging, and ensure product-audience fit.
• Keeping customers is more cost-effective than obtaining new
ones. During the cancellation process, departing customers
must provide their reason for cancelling.
– doesn’t fit my budget
– doesn’t fit my schedule or dietary needs,
• HelloFresh can analyse this data and answer the question,
“Why are people cancelling their subscriptions?”
• These insights can help improve HelloFresh’s product and
user experience to avoid losing more customers to those
reasons.
Identifying Technology Issues
• One example of diagnostic analytics that requires using a
software program or proprietary algorithm is running tests to
determine the cause of a technology issue.
• This is often referred to as “running diagnostics” and may be
something you’ve done before when experiencing computer
difficulty.
• Some of these algorithms are constantly at work in the
background of your machine, while others need to be initiated
by a human.
• One type of diagnostic test you may be familiar with is
solution-based diagnostics, which detects and flags symptoms
of known issues and conducts a scan to determine the root
cause. This can allow you to address the issue and escalate it
if the cause is serious.
Improving Company Culture
• Human resource departments can gather information about
employees’ sense of physical and psychological safety, issues
they care about, and qualities and skills that make someone
successful and happy.
• Many of these insights come from running internal, anonymous
surveys and conducting exit interviews to identify factors that
contributed to employees’ desire to stay or leave.
• Gathering information about employees’ thoughts and feelings
allows you to analyse the data and determine how areas like
company culture and benefits could be improved.
• Insights from surveys and interviews can also enable hiring
managers to determine which qualities and skills make
someone successful at your company or on your specific team,
and thus help attract and hire better candidates for open roles.
Predictive Analytics
• Used for predicting what is likely to happen in the future
• Predict the probability of occurrence of a future event such as
forecasting demand for products/services, customer churn,
employee attrition, loan defaults, fraudulent transactions,
insurance claim, and stock market fluctuations
• Amazon: uses predictive analytics to recommend products to
their customers. It is reported that 35% sales is achieved through
their recommender system
• HP: Developed a flight risk score for its employees to predict who
is likely to leave the company
• Netflix: Predicts which movie their customer is likely to watch
next. 75% of customer watch based on recommendations
Finance: Forecasting Future Cash Flow
• Every business needs to keep periodic financial records, and
predictive analytics can play a big role in forecasting your
organization’s future health.
• Using historical data from previous financial statements, as well
as data from the broader industry, you can project sales,
revenue, and expenses to craft a picture of the future and make
decisions.
Entertainment & Hospitality: Determining
Staffing Needs
• Overstaffing costs money, and understaffing could result in a
bad customer experience, overworked employees, and costly
mistakes.
• Customer influx and out flux depend on various factors, all of
which play into how many staff members a venue or hotel
needs at a given time.
• To predict the number of hotel check-ins on a given day, a
team developed a multiple regression model that considered
several factors. This model enabled Caesars to staff its hotels
and casinos and avoid overstaffing to the best of its ability.
Marketing: Behavioural Targeting
• Predictive analytics can be applied in marketing to forecast
sales trends at various times of the year and plan campaigns
accordingly.
• Historical behavioural data can help you predict a lead’s
likelihood of moving down the funnel from awareness to
purchase.
• For instance, a single linear regression model can be used to
determine that the number of content offerings a lead engages
with predicts their likelihood of converting to a customer down
the line. With this knowledge, plan targeted ads at various
points in the customer’s lifecycle.
Manufacturing: Preventing Malfunction
• Algorithms can be trained using historical data to accurately
predict when a piece of machinery will likely malfunction and
alert an employee who can stop the machine and save
damaged product and repair costs.
• This analysis predicts malfunction scenarios in the moment
rather than months or years in advance.
• Some algorithms even recommend fixes and optimizations to
avoid future malfunctions and improve efficiency, saving time,
money, and effort.
Prescriptive analytics
• Prescriptive analytics is the process of using data to determine
an optimal course of action by considering all relevant factors.
• This type of analysis yields recommendations for next steps.
• Machine-learning algorithms are often used in prescriptive
analytics to parse through large amounts of data faster. Using
“if” and “else” statements, algorithms comb through data and
make recommendations based on a specific combination of
requirements.
• It’s important to note: While algorithms can provide data-
informed recommendations, they can’t replace human
discernment.
Venture Capital: Investment Decisions
• Investment decisions, while often based on gut feelings, can be
strengthened by algorithms that weigh risks and recommend
whether to invest.
• an experiment compared an algorithm’s decisions to angel
investors' decisions.
• The algorithm outperformed angel investors who were less
experienced at investing and less skilled at controlling their
cognitive biases
• experienced angel investors outperformed the algorithm when
they were investing and able to control their cognitive biases.
• An algorithm is only as unbiased as the data it’s trained with, so
human judgment is required whether using an algorithm or not
Sales: Lead Scoring
• Lead scoring is the process of assigning a point value to various
actions along the sales funnel, enabling you, or an algorithm, to
rank leads based on how likely they are to convert into
customers.
• Actions you can assign value to include:
– Page views
– Email interactions
– Site searches
– Content engagement, such as attending webinars, downloading e-books,
or watching videos

• Assign the highest number of points to those that imply


purchase intent and negative points to those that reveal non-
purchase intent.
• This can help prioritize outreach to leads most likely to convert
into customers, potentially saving your organization time and
money.
Content Curation: Algorithmic
Recommendations
• Businesses’ algorithms gather data based on your engagement
history on their platforms (and potentially others, too).
• The combinations of your previous behaviours can act as
triggers for an algorithm to release a specific recommendation.
– YouTube will recommend you watch more of the same type
of video or similar content you may find interesting
• This prescriptive analytics use case can make for higher
customer engagement rates, increased customer satisfaction,
and the potential to retarget customers with ads based on their
behavioural history.
Banking: Fraud Detection
• With the sheer volume of data stored in a bank’s system, it
would be nearly impossible for a person to manually detect any
suspicious activity in a single account.
• An algorithm – trained using customers’ historical transaction
data – analyses and scans new transactional data for
anomalies.
– For instance, perhaps you typically spend $3,000 per month,
but this month, there’s a $30,000 charge on your credit card.
• The algorithm analyses patterns in your transactional data,
alerts the bank, and provides a recommended course of action.
– In this example, the course of action may be to cancel the
credit card, as it could have been stolen.
Product Management: Development and
Improvement
• Product managers can gather user data by surveying
customers, running tests with a product’s beta versions,
conducting market research with people who aren’t current
product users, and collecting behavioural data as current users
interact.
• All this data can be analysed – either manually or algorithmically
– to identify trends, discover the reasons for those trends, and
predict whether the trends are predicted to recur.
• Prescriptive analytics can help determine which features to
include or leave out of a product and what needs to change to
ensure an optimal user experience.
Marketing: Email Automation
• Marketers use email automation to sort leads into categories
based on their motivations, mind-sets, and intentions and
deliver email content to them based on those categories.
• Any interactions leads have with emails can put them in another
category, resulting in a different set of messages being
triggered.
• While this is pure algorithmic prescriptive analysis, a person
should plan, create, and oversee automation flows.
• Email automation allows companies to provide personalized
messaging at scale and increase the chance of converting a
lead into a customer using content that applies to their
motivations and needs.
A Brief Introduction to Big Data Analytics

• What Is Big Data? (Is it just “big”?)


– Big Data is data that cannot be stored or processed
easily using traditional tools/means
– Big Data typically refers to data that comes in many
different forms: large, structured, unstructured,
continuous
▪ 3Vs – Volume, Variety, Velocity
– Data (Big Data or otherwise) is worthless if it does not
provide business value (and for it to provide business
value, it has to be analyzed)
• More on Big Data Analytics is in Chapter 7
An Overview of the Analytics Ecosystem
• Understand the broader view how various players come together
• Understand organizations & opportunities in analytics ecosystem
• 11 key sectors or clusters are grouped into 3 categories:
– Technology Providers: provide technology, solutions, training
– Analytics Accelerators: work with technology providers/users
– Analytics Users
• Many companies play in multiple sectors
An Overview of the Analytics Ecosystem
Analytics Ecosystem – 1/3
• TP: Data Generation Infrastructure Providers
– Create infrastructure for data collection from different sources
– Sensors, Internet of Things (IoT)
• TP: Data Management Infrastructure Providers
– Provide hardware & software for data management solutions
– Hardware, storage solutions, database management systems
(SQL), Specialized integrated software (SAP), cloud
computing, software integrators, consultants, training providers
• TP: Data Warehouse Providers
– Services aimed at integrating data from multiple sources,
efficient storage, retrieval and processing
– Backbone of analytics industry
Analytics Ecosystem – 2/3
• TP: Middleware Providers
– Provide tools for reporting or descriptive analytics (Core BI)
– Offering dash-boarding, reporting, visualization
• TP: Data Service Providers
– Provide specialized external data collection, aggregation,
distribution mechanisms, such as weather data
• TP: Analytics Focused Software Developers
– Software for Descriptive, Predictive, Prescriptive analytics
• AA: Application Developers: Industry Specific or General
– Develop custom solutions for a specific industry
• AA: Analytics Industry Analysts and Influencers
– Advisers, professional societies, Authors of text books
Analytics Ecosystem – 3/3
• AA: Academic Institutions & Certification Agencies
– Universities: undergraduate, post graduate programs in
Management, CS, Statistics, Graphics etc.
– Certificate programs: proficiency in specific software
– Professional certifications from major technology providers
• AA: Regulator and Policy Makers
– Define rules, regulations for protecting employees,
customers, shareholders of analytics organizations
• Analytics User Organizations
– Organizations in every industry, regardless of size, shape
and location are using or exploring analytics
– Private sector, government, education, military etc.
Business Analytics vs Data Science
Business Analytics is the Data science is the study of data
statistical study of business data using statistics, algorithms and
to gain insights. technology.
Uses both structured and
Uses mostly structured data.
unstructured data.
Coding is widely used. This field
Does not involve much coding. It is a combination of traditional
is more statistics oriented. analytics practice with good
computer science knowledge.
The whole analysis is based on Statistics is used at the end of
statistical concepts. analysis following coding.
Studies trends and patterns Studies almost every trend and
specific to business. pattern.
Top industries where business
Top industries/applications where
analytics is used: finance,
data science is used: e-
healthcare, marketing, retail,
commerce, finance, machine
supply chain,
learning, manufacturing.
telecommunications.
Data Mining
• Data mining is a process that uses statistical, mathematical,
and artificial intelligence techniques to extract and identify
useful information and subsequent knowledge (or patterns)
from large sets of data
– Patterns: business rules, affinities, correlations, trends or
predictive models
• The nontrivial process of identifying valid, novel, potentially useful,
and ultimately understandable patterns in data stored in structured
databases. – Fayyad et al., (1996)
• Keywords in this definition: Process, nontrivial, valid, novel,
potentially useful, understandable.
Data Mining - Blend of Multiple Disciplines
Data Mining Characteristics & Objectives

• Source of data for DM is often a consolidated data


warehouse (not always!).
• DM environment is usually a client-server or a Web-
based information systems architecture.
• Data is the most critical ingredient for DM which may
include soft/unstructured data.
• The miner is often an end user.
• Striking it rich requires creative thinking.
• Data mining tools’ capabilities and ease of use are
essential (Web, Parallel processing, etc.).
How Data Mining Works
• D M builds models to discover patterns among attributes presented
in data set
– Patterns are either explanatory or predictive
• Types of patterns
– Association: commonly purchased items together
– Prediction: occurrences of events based on past data
– Cluster(segmentation): natural grouping based on characteristics
– Sequential relationships: time ordered events
• Such patterns were manually extracted from data by humans for
centuries, when the data was manageable
• Evolution of automated or semi-automated means of processing
large data sets is referred as data mining
A Taxonomy for Data Mining
Data Mining Tasks & Methods Data Mining Algorithms Learning Type

Prediction

Decision Trees, Neural Networks, Support


Classification Supervised
Vector Machines, kNN, Naïve Bayes, GA

Linear/Nonlinear Regression, ANN,


Regression Supervised
Regression Trees, SVM, kNN, GA

Autoregressive Methods, Averaging


Time Series Supervised
Methods, Exponential Smoothing, ARIMA

Association

Market-basket Apriory, OneR, ZeroR, Eclat, GA Unsupervised

Expectation Maximization, Apriory


Link analysis Unsupervised
Algorithm, Graph-based Matching

Apriory Algorithm, FP-Growth, Graph-


Sequence analysis Unsupervised
based Matching

Segmentation

Clustering K-means, Expectation Maximization (EM) Unsupervised

Outlier analysis K-means, Expectation Maximization (EM) Unsupervised


Data Mining Applications – 1/2
• Customer Relationship Management
– Maximize return on marketing campaigns
– Improve customer retention (churn analysis)
– Maximize customer value (cross-, up-selling)
– Identify and treat most valued customers

• Banking & Other Financial


– Automate the loan application process
– Detecting fraudulent transactions
– Maximize customer value (cross-, up-selling)
– Optimizing cash reserves with forecasting

• Retailing and Logistics


– Optimize inventory levels at different locations
– Improve the store layout and sales promotions
– Optimize logistics by predicting seasonal effects
– Minimize losses due to limited shelf life
Data Mining Applications – 2/2
• Manufacturing and Maintenance
– Predict/prevent machinery failures
– Identify anomalies in production systems to optimize the use
manufacturing capacity
– Discover novel patterns to improve product quality

• Brokerage and Securities Trading


– Predict changes on certain bond prices
– Forecast the direction of stock fluctuations
– Assess the effect of events on market movements
– Identify and prevent fraudulent activities in trading

• Insurance
– Forecast claim costs for better business planning
– Determine optimal rate plans
– Optimize marketing to specific customers
– Identify and prevent fraudulent claim activities
Text Mining Concepts
• Vast majority of business data is stored in text documents that
are unstructured
• 85% of all corporate data is in some kind of unstructured form
(e.g., text)
• Unstructured corporate data is doubling in size every 18
months
• Tapping into these data & information sources will have the
necessary knowledge to make better decisions, leading to a
competitive advantage over those businesses lag behind
• Goal of both text analytics & text mining is to turn unstructured
textual data into actionable information through application of
Natural Language Processing (NLP)
Text Analytics and Text Mining
Text Analytics vs Text Mining
• Text Analytics is broader concept, which includes:
– Information retrieval based on set of key terms
– Information extraction,
– data mining and
– web mining
• Text Mining is primarily focused on discovering new and useful
knowledge from the textual data sources
– Information retrieval
– Text mining
• Text Analytics is more commonly used in business application
context and Text Mining is frequently used in academic
research circles.
• Text Analytics and Text Mining are used synonymously
Text Mining
• Is the semi-automated process of extracting patterns (useful
information & knowledge) from large amounts of unstructured data
sources, such as word, pdf, text, XML etc.
• Text mining process
– Impose structure on the text-based data sources
– Extracting relevant information & knowledge using data mining
tools and techniques
• Text mining is useful in areas such as
– Law (court orders), Academic research, Finance (quarterly
reports), Medicine (discharge summaries), biology (molecular
interactions), Technology (patents), Marketing (customer
comments)
– Electronic communication
▪ Spam filtering, prioritization and categorization
▪ Automatic response generation
Text Mining – 1/2
1. Text pre-processing: transforms a raw text file into clearly-
explained sequence of linguistically-meaningful units
– Text Clean-up: removing advertisements from web pages to
cutting out tables and figures, etc.
– Tokenization: segmentation of sentences into words by
erasing spaces, commas etc.
– Filtering: extricates irrelevant content-information including
articles, conjunctions, prepositions, etc. Even the words of
frequent repetitions are also removed.
– Stemming: transforming words to its stem. For example, the
word “go” is the stem goes, going and gone.
– Lemmatization: reorganizes the word to correct root
linguistically
– Linguistic processing: Involving Part-of-speech tagging (POS),
Word Sense Disambiguation (WSD) and Semantic structure
Text Mining – 1/2
2. Text transformation: choosing the subset of significant features
that are used in creating a model. It diminishes the
dimensionality through excluding redundant and unnecessary
features
3. Text Mining Methods: such as classification, clustering,
summarization, and many more are used
Text Mining Applications
• Information extraction: identification of key phrases &
relationships within text by looking for predefined objects and
sequences in text by way of pattern matching
• Topic tracking: Predict documents on interest based on user
profile & past history
• Summarization: a document to save time to reader
• Categorization: Identify main theme & place in predefined category
• Clustering: grouping similar documents without predefined
categories
• Concept linking: connects related documents by identifying shared
concepts (help find information not found using search methods)
• Question answering: Finding best answer through knowledge
driven pattern matching
Data Visualization vs Visual Analytics
• The use of visual representations to explore, make sense of,
and communicate data.”
• Data visualization presents information following the
aggregation, summarization, and contextualization of data
• Data visualization is aimed at answering (associated with BI)
– What happened?
– What is happening?
• Visual analytics is the combination of visualization and
predictive analytics. Visual analytics is aimed at answering
– Why is it happening?
– What is more likely to happen?
• Visual analytics is usually associated with business analytics
Visual Analytics by SAS Institute
Information Dashboards
• Dashboards provide visual displays of important information that
is consolidated and arranged on a single screen so that
information can be digested at a single glance and easily drilled
in and further explored
• The fundamental challenge of dashboard design is to display all
the required information on a single screen, clearly and without
distraction, in a manner that can be assimilated quickly
• Three layer of information
– Monitoring: to monitor key performance metrics
– Analysis: to find root cause of problems
– Management: Identify what actions to take to resolve problem
Performance Dashboards
What to look for in a dashboard
• Use of visual components to highlight data and exceptions that
require action
• Transparent to the user, meaning that they require minimal
training and are extremely easy to use
• Combine data from a variety of systems into a single,
summarized, unified view of the business
• Enable drill-down or drill-through to underlying data sources or
reports
• Present a dynamic, real-world view with timely data
• Require little coding to implement, deploy, and maintain
Web Mining Overview
• Customers are expecting companies to offer their
products/services over the internet
• Customers are using internet for
– Buy products/services
– Talking about companies
– Sharing transactional/usage experience with others
• Delays in service, manufacturing, shipping, delivery and customer
inquires are no longer private incidents
• Successful companies are embracing internet for
– Betterment of business processes
– Better communicate with customers
– Understand their needs and wants
– Serve them thoroughly and expeditiously
Challenges for Knowledge Discovery @Web
• Because of its sheer size and complexity, mining the web is not an
easy undertaking by any means
• Search engines constantly search web and index web pages
under certain keywords.
• Simple keyword-based search engine suffers from deficiencies
– A topic of any breadth contains hundreds/thousands of pages
– Many documents that are highly relevant may not contain
exact key words defining them
• Web mining can identify authoritative web pages, classify web
documents and resolve many ambiguities and subtleties raised in
keyword-based web search engines
Web Mining vs Web Analytics
• Web mining is the process of discovering intrinsic relationships
(interesting & useful information) from web data, which are
expressed in the form of textual, linkage, or usage information
• Web mining is inclusive of all the data generated via internet
including transaction, social, and usage data.
• Web mining aims to discover previously unknown patters and
relationships (using novel predictive or prescriptive analytics)
• Web mining relies heavily on data mining and text mining and
their enabling tools and techniques
• Web analytics primarily web side usage data focused
• Web analytics aims to describe what has happened on the web
site (metric-driven descriptive analytics)
Types of Web Mining
Data Text
Mining Mining

Web Mining

Web Content Mining Web Structure Mining Web Usage Mining


Source: unstructured Source: the unified Source: the detailed
textual content of the resource locator (URL) description of a Web
Web pages (usually in links contained in the site’s visits (sequence of
HTML format) Web pages clicks by sessions)

Search Engines Sentiment Analysis Semantic Webs Web Analytics

Page Rank Information Retrieval Graph Mining Social Analytics Clickstream Analysis

Search Engine Optimization Social Network Analysis Social Media Analytics Weblog Analysis

Marketing Attribution Customer Analytics 360 Customer View Voice of the Customer
Web Content/Structure Mining
• Mining the textual content on the Web
• Data collection via Web crawlers
• Web pages include hyperlinks
– Authoritative pages
– Hubs
– Hyperlink-induced topic search (HITS) alg.
Web Usage Mining (Web Analytics)
• Extraction of information from data generated through Web page
visits and transactions. Clickstream data
– data stored in server access logs, referrer logs, agent logs,
and client-side cookies
• Web analytics holds promise of revolutionizing how business is
done on the web
– Tool for e-business market research to improve e-commerce
• Two categories of web analytics
– Off-site: measurement takes place outside your website
– Onsite: on-site visitor measurement in commercial context
• Website data is compared against KPI and improve
marketing campaign’s audience response
Data Collection – Web Analytics
• Traditional method: server log files
– Web server records file requests made by browsers
• Page tagging: mouse clicks are captured by JavaScript
embedded in site pages and data sent to third-party analytics
dedicated server
• Other Sources:
– Email, direct mail campaign data,
– sales and lead history
– social media originated data
Web Usage Mining Applications
• Determine the lifetime value of clients
• Design cross-marketing strategies across products.
• Evaluate promotional campaigns
• Target electronic ads and coupons at user groups based on user
access patterns
• Predict user behavior based on previously learned rules and
users' profiles
• Present dynamic information to users based on their interests and
profiles
Web Usage Mining (Clickstream Analysis)

Pre-Process Data Extract Knowledge


Website
User / Collecting Usage patterns
Customer Merging User profiles
Cleaning Page profiles
Structuring Visit profiles
- Identify users Customer value
- Identify sessions
- Identify page views
- Identify visits
Weblogs

How to better the data


How to improve the Web site
How to increase the customer value
Web Analytics Metrics
• Web analytics can be used to effectively manage the marketing
efforts of the organization and its various products or services
• Web analytics programs provide nearly real-time data, which can
document your marketing campaign success or empower you to
make timely adjustments to your current marketing strategy
• Four categories of metrics:
• Web site usability: How were the visitors using my Web site?
• Traffic sources: Where did they come from?
• Visitor profiles: What do my visitors look like?
• Conversion statistics: What does it all mean for the business?
Web Analytics Metrics
• Web Site Usability • Visitor Profiles
– Page views – Keywords
– Time on site – Content groupings
– Downloads – Geography
– Click map – Time of day
– Click paths – Landing page profiles
• Traffic Source • Conversion Statistics
– Referral Web sites • New visitors
– Search engines • Returning visitors
– Direct • Leads
– Offline campaigns • Sales/conversions
– Online campaigns • Abandonment/exit rate
A Sample Web Analytics Dashboard
Social Analytics
• Defined as monitoring, analysing, measuring and interpreting
digital interactions and relationships of people, topics, ideas
and content
• Includes mining textual content from social media and
analysing socially established networks for the purpose of
gaining insight about existing and potential customers’ current
and future behaviours – about their likes, dislikes towards firms
products and services
• Two types of Social Analytics
– Social Network Analysis (SNA)
– Social Media Analytics (SMA)
Social Network Analysis (SNA)
• Social Network - social structure composed of individuals linking
to each other with some type of connections/relationships
• A holistic approach to analyse the structure and dynamics of
social entities
– Identify local & global patterns
– Locate influential entities
– Examine network dynamics
• Social networks are self-organizing, emergent and complex
• Typical social network types
• Communication networks, community networks, criminal
networks, innovation networks
Social Network Analysis Metrics
• Various metrics are used to analyze social network structures
from different perspectives. These metrics are grouped into
• Connections
– Homophily, Multiplexity, Mutuality/reciprocity
– Network closure, Propinquity
• Segmentation
– Cliques and social circles
– Clustering coefficient, Cohesion
• Distribution
– Bridge, Centrality
– Density, Distance
– Structural holes
Social Media – Definitions and Concepts
• Enabling technologies of social interactions among people
• Takes on many different forms
– Internet forums, Web logs, social blogs, microblogging, wikis,
social networks, podcasts, pictures, video, and product reviews
• Six types of social media based on media research and social
process
– Collaborative projects (Wikipedia)
– Blogs and microblogs (Twitter)
– Content communities (YouTube)
– Social networking sites (Facebook)
– Virtual game worlds (World of Warcraft)
– Virtual social worlds (Second Life)
Social versus Industrial Media
• Web-based social media are different from traditional/industrial
media, such as newspapers, television, and film
• Differentiating characteristics
– Quality
– Reach
– Frequency
– Accessibility
– Usability
 Immediacy
 Updatability
Six Levels of Engagement
• Creators: publish a blog or web page, upload audio/music and
video of their own creation and post articles and stories they have
written.
• Critics: post ratings and reviews of products or services and
comment on others’ blogs.
• Joiners: participate in online forums and contribute to, or edit,
articles in a wiki.
• Collectors: use RSS feeds, vote in online contests and websites,
add tags to web pages or photos and click “like” Facebook entries.
• Spectators: read blogs, listen to podcasts, watch video from other
users, read online forums as well as customer ratings/reviews.
• Inactive: do none of the above.
Social Media Analytics
• It is the systematic and scientific ways to consume the vast
amount of content created by Web-based social media outlets,
tools, and techniques for the betterment of an organization’s
competitiveness
• Social media analytics allowing organizations to reach out to and
understand consumers as never before.
• Tool for integrated marketing and communications strategies
• Chance to join a conversation with millions of customers around
the globe every day
Measuring Social Media Impact
• Valuable insight is hidden in all the user generated content on
social media site.
• Challenges
– How to dig insights social media messages?
– How to measure the impact of the organizations efforts?
• Multitude of social media analytics tools are grouped into
following broad categories:
– Descriptive analytics: use simple statistics to identify activity
characteristics and trends
– Social network analysis: identify biggest sources of influence
– Advanced analytics: identify themes, sentiments and
connections through content analysis using predictive and
text analytics
Best Practices in Social Media Analytics
• Think of measurement as a guidance system, not a rating
system
• Track the elusive sentiment
• Continuously improve the accuracy of text analysis
• Look at the ripple effect
• Look beyond the brand
• Identify your most powerful influencers
• Look closely at the accuracy of your analytic tool
• Incorporate social media intelligence into planning

You might also like