You are on page 1of 53

Lecture 2

Some unfinished business

• Plagiarism and copying will not be tolerated


• AI guidelines are given in the course outline, please follow them
In God we trust, all others must bring data!

-W Edwards Deming
Process of data-driven decision process

Identify the problem or opportunity for value creation

Identify source of data (primary as well as secondary)

Pre-process and transform data. Suit input variables of the model

Divide the data sets into subset training and validation data sets

Build analytical models and identify the best model

Implement solution/decision
Akshay Patra Foundation

• 84000 school children from 650 schools are served food, from 1
kitchen in Bangalore
• Vehicle Route Problem
• Cost and time are a limiting factor

• Lesson: Data can be used to solve problems for a SME, not just
large corporate
BI vs BA

• Business Intelligence (BI) refers to the technologies, processes, and practices used
to transform raw data into meaningful and useful information that can inform an
organization's decision making. It aims to provide a historical, current and
predictive view of the organization's operations, by using dashboards, reports, and
visualizations to provide insights.
• Business Analytics (BA) is a subset of BI that focuses on using data, statistical
algorithms and machine learning techniques to identify and understand the
relationships between different aspects of an organization's operations, with the
goal of making better decisions. It uses quantitative and statistical methods to drive
decision making, and often includes predictive analytics, prescriptive analytics, and
causal analytics.
• In summary, BI provides a broad overview of the organization's operations, while BA
focuses on using data-driven insights to inform decision making.
Business Analytics

• Business analytics is the practice of using data, statistical


algorithms, and machine learning techniques to extract insights
and knowledge from data, and transform that information into
actionable business decisions
• Business analytics encompasses a range of methodologies and
techniques including
• Descriptive analytics:
• Diagnostic analytics,
• Predictive analytics, and
• Prescriptive analytics
Purpose of BA

• The purpose of business analytics is to


1.help organizations make more informed decisions by providing
insights and predictions based on data
2.This can be used to optimize operations, improve customer
experiences, and drive business growth
3.By leveraging business analytics, organizations can gain a
competitive advantage, increase efficiency, and make data-driven
decisions that are based on evidence, rather than intuition
Aspects or Constituents of BA

Business
context

Business
Analytics

Technology Data Science


Business Context

• Target Store - one of the largest retail chain with sales of


$106 billion in ‘22
• Pregnancy score to each female customer- why this customer
segment became important
• Price insensitive, willing to spend more on comfort
• Baby related product market $ 38 billion
• Shopping behaviour changes during special events
• Grocery shopping
• Did you forget feature
• Search time due to large number of items? Smart market
Technology- Enables Data Science
• Technology is needed to capture data (Vital to identify any
problem)
• Data capture, Data storage, Data preparation/cleansing, Data
analysis, Data share
• Computing power to run models or methods used to solve
data require high computing power
• Unstructured data
• Large data
• Complex algorithms
• General programing software like R/Python or specific functional
suits like Microsoft Power BI and Tableau
• Technology enables solution to business analytics problems in
stipulated time, and cost effectively
Data Science

• One of the problem solving techniques


• Statistical techniques
• Operations research/ optimization
• ML/AI techniques
• ‘Target’ case is classification problem
• Based on purchases, predict pregnant or not-pregnant
• Each problem can be solved using multiple methods usually
• And each method may result in multiple models
• Selecting the right model is the skill to be learned
Analytics problem solving

• Start with what of the features (Descriptive)


• Then answer why of the features (Diagnostic)
• Then model to see the possible outcome (Predictive)
• Last, you take a decision to make it happen (Prescriptive)
Business Analytics- Types

Descriptive Diagnostic Predictive Prescriptive


Analytics Analytics Analytics Analytics

Business Analytics
If the statistics are boring, then you have got
the wrong number
- Edward R. Tufte
Descriptive Analytics

• Descriptive analytics is a type of data analysis that focuses on


summarizing and describing the characteristics of a dataset. It
involves using statistical methods and visualizations to summarize
and present the data in a meaningful way. Descriptive analytics
typically provides a historical perspective of the data, rather than
making predictions about future events.
• Examples of descriptive analytics include:
• Generating summary statistics, such as mean, median, and standard
deviation
• Creating frequency distributions and histograms to show the distribution
of the data
• Generating charts and graphs to visualize trends, patterns, and
relationships in the data
• Calculating cross-tabulations and contingency tables to analyze
relationships between variables
Descriptive Analytics

• The goal of descriptive analytics is to provide insights and


understanding of the data and to support informed decision
making. Descriptive analytics is often used as a starting point for
more advanced forms of data analysis, such as predictive analytics
and prescriptive analytics
• A real life case could be that of Flipkart using the descriptive
analytics to identify the trends in seasonal demand and adjust
the inventory, recommender system, online marketing campaigns
accordingly
Diagnostic Analytics
• Diagnostic analytics is a type of data analysis that focuses on identifying the root
cause of a problem or issue. It involves using a variety of methods and techniques
to examine data and identify correlations, patterns, and trends that can help
explain why a particular problem is occurring.
• Examples of diagnostic analytics include:
• Examining data from multiple sources to identify correlations, relationships and causations
• Using data visualization tools to identify patterns and anomalies in the data
• Conducting statistical tests to determine the significance of relationships between variables
• Applying machine learning algorithms to identify hidden relationships and patterns in the data
• The goal of diagnostic analytics is to provide insights into the underlying causes of
a problem or issue, so that organizations can take appropriate action to address the
problem and prevent it from happening again in the future. This type of analytics is
typically used to solve complex problems and improve operational efficiency.
Diagnostic Analytics
• An example: use of the technique by Amazon to improve its delivery process.
• Amazon uses a variety of data sources, such as package tracking
information, driver data, and customer feedback, to identify bottlenecks
in its delivery process.
• By using diagnostic analytics, Amazon was able to identify areas where
packages were frequently delayed and pinpoint the root cause of the
problem, such as poor route planning or insufficient delivery capacity.
• As a result of the insights obtained through diagnostic analytics, Amazon was
able to take action to improve its delivery process, including optimizing
routes, increasing delivery capacity, and improving customer
communication.
• This not only improved the customer experience, but also resulted in cost
savings for Amazon by reducing the number of delivery failures and returns.
• This example demonstrates how diagnostic analytics can be used to identify
and solve complex problems in a real-world setting, leading to significant
improvements in business operations and customer satisfaction.
If you torture the data long enough, it will
confess
- Ronald Coase
Predictive Analytics
• Predictive analytics is a type of data analysis that uses
statistical and machine learning techniques to make
predictions about future events based on historical data. The
goal of predictive analytics is to provide organizations with
insights into future trends and patterns, so that they can
make informed decisions and take proactive actions.
• Examples of predictive analytics include:
• Forecasting sales, customer behavior, and market trends
• Predicting which customers are most likely to churn or respond to a
marketing campaign
• Detecting fraud and other security threats
• Improving supply chain management and logistics by predicting
demand and optimizing routes
Predictive Analytics

• Predictive analytics involves a combination of data collection,


data preparation, model building, and model validation.
Predictive models can be based on a wide range of techniques,
including regression analysis, decision trees, neural networks, and
time-series analysis.
• Predictive analytics is used by organizations across a wide range of
industries, including finance, retail, healthcare, and
transportation, to make informed decisions, improve operational
efficiency, and drive business growth.
Predictive Analytics
• An example: Netflix to recommend content to its subscribers. Netflix collects and
analyzes data on subscriber viewing habits, such as which shows and movies they
watch and when, to make personalized content recommendations.
• By using predictive analytics, Netflix is able to generate insights into what its
subscribers are likely to watch next and make recommendations based on those
insights.
• This helps improve the customer experience by providing relevant and
personalized recommendations, leading to increased engagement and reduced
churn.
• For example, Netflix might use predictive analytics to identify subscribers who are
fans of certain genres of movies or TV shows and make recommendations for
similar content.
• This information is then used to create customized recommendation algorithms
that provide each subscriber with a unique experience.
• This example demonstrates how predictive analytics can be used to improve the
customer experience and drive business growth in a real-world setting.
• By using predictive analytics, Netflix was able to stay ahead of its competitors
Every decision has a consequence

- Damon Darrel
Prescriptive Analytics

• Prescriptive analytics is a type of advanced analytics that provides


decision-makers with specific recommendations for actions to take in
order to achieve a desired outcome.
• Unlike predictive analytics, which focuses on forecasting future events,
prescriptive analytics focuses on optimizing decision-making in real-
time by considering multiple possible scenarios and outcomes.
• Examples of prescriptive analytics include:
• Optimizing pricing strategies for maximum profitability
• Finding the most efficient logistics routes for delivery and transportation
• Determining the best resource allocation for manufacturing and production
processes
• Designing and implementing optimal workforce schedules for maximum
productivity
Prescriptive Analytics

• Prescriptive analytics uses a combination of mathematical


algorithms, decision science, and artificial intelligence to
analyze data and generate recommendations for action. The
goal of prescriptive analytics is to automate decision-making
and help organizations achieve their objectives more
efficiently and effectively.
• Prescriptive analytics is increasingly used by organizations
across a wide range of industries, including finance,
healthcare, and transportation, to improve decision-making,
increase operational efficiency, and drive business growth.
Prescriptive Analytics

• An example of a real-life case of prescriptive analytics by a


company is the use of the technique by UPS to optimize its
delivery routes. UPS collects and analyzes data on factors
such as traffic patterns, delivery locations, and vehicle
capacity to make decisions about the most efficient routes
for its delivery trucks.
• By using prescriptive analytics, UPS is able to generate
recommendations for the most efficient routes based on
multiple factors, including delivery times, fuel consumption,
and vehicle utilization. This helps UPS reduce fuel costs,
improve delivery times, and increase operational efficiency.
Prescriptive Analytics

• For example, UPS might use prescriptive analytics to


determine the best route for a delivery truck based on the
delivery locations, traffic patterns, and delivery deadlines.
This information is then used to create an optimized route
that maximizes efficiency and minimizes waste.
• This example demonstrates how prescriptive analytics can be
used to improve decision-making and increase operational
efficiency in a real-world setting. By using prescriptive
analytics, UPS was able to reduce its costs, improve its
delivery times, and enhance its overall competitiveness in
the delivery and logistics industry.
Some terminology in Analytics

• BA: Business Analytics


• BI: Business Intelligence
• AI: Artificial Intelligence
• ML: Machine Learning
• DL: Deep Learning
Some techniques

Predictive techniques Prescriptive Techniques


• Regression, Logistic • Linear programming
regression • Integer programing
• Classification tree • Combinatorial optimization
• Forecasting • Non-linear programing
• K-nearest neighbours • Meta heuristics
• Markov chains • Six Sigma
• Random forests • Social Media Analytics tools
• Boosting
• Neural networks
Some specialization areas in the field of
Business Analytics

1.Predictive Analytics 9.Optimization Techniques


2.Big Data Analytics 10.Forecasting Methods
3.Data Visualization 11.Deep Learning
4.Machine Learning 12.Natural Language Processing
5.Artificial Intelligence 13.Recommender Systems
6.Text Analytics 14.Time Series Analysis
7.Social Media Analytics 15.Data Warehousing
8.Web Analytics 16.Customer Analytics.
One simple sample algorithm
K mean clustering
• K is provided by the user, we will later see how to estimate K
• Divides the data into k groups based on some kind of distance
• Estimation of K is done using scree plot or elbow plot
• Calculate distance between two points (also called Euclidean
distance)
• Pythagoras – right angled triangle
• Use it for distances on axes
• Define Euclidean distance
• Data = {(4,21), (5,19), (10,24), (4,17), (3,16), (11,25),
(14,24), (6,22), (10,21), (12,21)}
• Assume k=2
K mean clustering
• Step-1: Select the number K to decide the number of clusters
• Step-2: Select random K points or centroids. (It can be other
from the input dataset)
• Step-3: Assign each data point to their closest centroid,
which will form the predefined K clusters
• Step-4: Calculate the variance and place a new centroid of
each cluster
• Step-5: Repeat the third steps, which means reassign each
data-point to the new closest centroid of each cluster, while
reassignments happen
• Step-7: The model is ready
Plot of data points
Plots
K mean clustering

• Estimating the value of k using a plot


• We run the algorithm for k = 1 to 10 and plot the variance
• This is called an elbow plot
• We select the point of elbow
Distances

• Euclidean distance- what you use regularly


• Manhattan distance- what google maps would use in Manhattan
• Chebyshev distance- max⁡(∣x2−x1∣,∣y2−y1∣) Minkowski or
generalised distance- (∣x2−x1∣p+∣y2−y1∣p)1/p(∣x2​−x1​∣p+∣y2​−y1​
∣p)1/p
• Hamming distance- minimum number of bit changes in
transforming one binary sequence into another
• Mahalanobis distance- used in multivariate anomaly detection
What is python?

• Python is a high-level, interpreted programming language known


for its easy readability and simple syntax.
• It was created by Guido van Rossum and first released in 1991.
• Some key features:
• Interpreted language
• Dynamic typing
• High level language
• Extensive libraries
Why Python?

• Versatility
• Community and support
• Ease of learning
• Career opportunities
Python installation

• Easiest is to go ahead with Anaconda


• A more tough way is to install python and required libraries using
the command pip install library_name
• To verify your installation, check the version with command
python –version
Python IDE

• If you choose the harder route, you will get IDLE IDE
• If you have some experience of coding, go with PyCharm
• But easiest is to go with Jupyter notebook

• We will use Jupyter Notebook


• And as we learn we will see why we went with it
Writing your first iPython file

• Please make a folder called roll_name within the Documents


folder of the PC
• Create a subfolder named Lecture 2
• Copy the path to this folder
• Open Jupyter notebook
• Create a new python program within folder named Lecture 2
• At the end of the class, zip the folder and mail it to yourself
Holy grail of programming

• First program: “Hello World!”


• Let us add a comment to it- “My first program in PBA”
• Adding multiline comments
My first program in PBA course
I am starting with coding in python
I will apply it to analytics
Next step

• Asking for name and printing Welcome <Name>


• Input command
name = input("Enter your name: ")
• Print statement
print("Welcome", name)
Basic data types

• Integers
• Floating point numbers
• Strings
• Boolean
Some examples

• Define PI=3.1415
• Ask for the radius of a circle
• Calculate the circumference and Area
• Print it
Type conversion

• Dynamic typing, so do not need to define variables


• Changing one type to another
• int()
• float()
• str()
• Basic datatypes are immutable
• We can check the memory location by command id(variable_name)
Try the following code

x=5
print(id(x))
x += 1print(id(x))

• Are the memory locations the same?


Composite data type

• Lists [mutable]
• Tuples (Immutable)
• Dictionaries {paired key: value}
• Sets {collection of different data types but distinct}
Specialized data types

• Arrays (from ‘array’ module)


• DataFrames (from ‘Pandas’ library)
• NumPy Arrays (from ‘NumPy’ library)
‘None’ type

• A special type representing the absence of a value or a null value.


It's often used as a placeholder for optional or missing values

You might also like