You are on page 1of 90

Road Map to Predictive Analytics

Dr. P.K.Viswanathan
Professor(Analytics)
Present Competitive Environment
has been witnessing cornucopia of
Data that is increasing at an
astonishing rate beyond
human imagination.
AI is the New Electricity

“AI will transform every


industry just like electricity
transformed them 100
years back.”
Andrew Ng
Connection between Analytics and AIMLDL

AI
ML
▪ Artificial Intelligence(AI is the major field)
▪ Machine Learning(ML) is a subfield of AI
▪ Deep Learning(DL) is a subfield of ML
DL
Pillars of Analytics

▪ Descriptive Analytics What has happened?

▪ Diagnostic Analytics Why it has happened?

▪ Predictive Analytics What will happen?

▪ Prescriptive Analytics What should be done?


What is Predictive Analytics?

▪ Predictive analytics involves the use of data and quantitative


modeling to predict future trends and events. Predictive Analytics
generates potential future scenarios that can help drive strategic
decisions.

▪ In this modern internet and information technology world,


predictive analytics uses machine-learning algorithms to
automate strategic decisions.
Predictive Analytics: Examples
▪ I have large amount of data on various customer characteristics.
Can you segment the market appropriately and then predict in each segment
whether a customer will buy my new product? Which segment has the highest
probability of buying?

▪ Can you predict when the market churn will take place so that my company
can take appropriate action to save a lot of money?

▪ What is the chance that a customer will default on a loan if I choose to give?
▪ What is the market demand for my new product that I would like to launch?
Why the term “Predictive Analytics”?

Predictive
Training Data Model
Algorithms
Data Set

Test Data Model Predictions


Supervised Learning

Nature of Y Nature of X Model to Use?

1)Continuous Continuous Multiple Regression

2)Continuous Categorical Dummy Regression

3)Continuous Mixed Multiple Regression(Dummy


Coding for Categorical)
Supervised Learning

Nature of Y Nature Model to Use?


of X

4)Binary(0/1) Continuous Logistic Regression/Discriminant


Analysis
5)Binary(0/1) Categorical Logistic Regression

6)MultiClass(>2) Continuous Multiple Discriminant Analysis


Supervised Learning
Modern Classifiers

• CART

• Neural Nets

• Random Forest

• Support Vector Machines(SVM)

• Naive Bayes
Unsupervised Learning

Nature of X Model to Use

1)Continuous If the variables are highly correlated,


collapse them into dimension by using Principal
Component Analysis.

2)Continuous If the aim is to reduce the objects, use Cluster


Analysis for Segmenting into Groups.

3)Categorical Use Correspondence Analysis for Dimension


Derivation and Clustering
Quick-Review Test

1) In an analytic study to understand consumer behavior toward


buying, the response variable rating is continuous(scale of 1 to 7
was used) and it depends on advertisement with three
levels(Low budget, medium budget, and high budget) and price
two levels(High and Low). The model to predict consumer rating
is

Dummy Regression(Preference Decomposition, and Conjoint


analysis are also correct )
Quick-Review Test

2) If the objective is to classify the consumers into low risk takers,


medium risk takers, and high risk takers based on key
characteristics, the models that can be used are
Discriminant Analysis, Logistic Regression, and Neural Network.
BABI-Review Test

3) In a predictive modeling study to predict loan default, two


independent variables were used namely Income and Current
Loans in Credit Card. The logistic regression gave the odds
(Exp(Beta) corresponding to Current Loans in Credit Card as
2.78. Interpret this number.
[0 represents Not a Defaulter and 1 represents a Defaulter]

The odds of defaulter to non-defaulter will be 2.78 to 1 for every unit


increase in Current Loans.
Quick-Review Test

4) When a very large number of variables are involved in a study to


understand selling behavior, the objective is to collapse these
variables into manageable dimensions. The appropriate
technique is

Principal Component Analysis


Quick-Review Test

5) When we want to understand Interaction between factors and


relationship between variables, we use

ANOVA and Correlation(Correlation and Regression is also


correct)
Logistic Regression-A Conceptual Framework

Presentation
Dr. P.K.Viswanathan
Professor(Analytics)
Logistic Regression-Examples

▪ Banking: What is the likelihood that someone will default on


a loan or prepay her mortgage?

▪ Marketing: What is the likelihood of someone responding to


a mail campaign?

▪ Medicine: What is the likelihood a patient will get well or


die?

▪ Fraud Detection: What is the likelihood a transaction/claim/


is fraudulent?
Why Logistic Regression?

▪ No matter, however hard we try, there is no guarantee in OLS


regression the probability of the dependent variable will be
in the range of 0-1.

▪ In all likelihood, a few observations will go out side 0-1 which


makes no sense in probability.

▪ Hence Logistic Regression is used by Analytics Professionals.


Odds and Probabilities

Probability = Odds/(odds+1)
Odds = Probability/(1-Probability)
Why Odds Anyway?

▪ Odds are used to counteract the fact that linear regression


produces probability values outside the range of 0 and 1.

▪ Going with an odds forces the upper bound on the probability.


The lower bound is achieved by taking the natural log of the
regression value.
Visual of Logit Curve
Logistic Regression

◼ Logistic Regression Equation


The relationship between Probability P and X1, X2, . . . , Xk is
described by the following equation:

𝑒𝑍
P=
(1+𝑒 𝑍 )

Z=b0+b1X1+b2X2+...bkXk
X1, X2, . . . , Xk are the predictor variables

P represents the probability that Y=1


1-P represents the probability that Y=0
Logistic Regression
Maximum Likelihood Estimation(MLE)

When Y=1, L=P


When Y=0, L=1-P.
This is for a single data point. You should multiply for all points.
Maximizing L is same as maximizing Log L (base e).

𝐿𝑜𝑔𝐿 =∑Ylog(P)+∑(1-Y)Log(1-P)
Walk the Talk
Simmons Catalogue1

Simmons’ catalogs are expensive and Simmons


would like to send them to only those customers who
have the highest probability of making a $200 purchase
using the discount coupon included in the catalog.
Simmons’ management thinks that annual spending
at Simmons Stores and whether a customer has a
Simmons credit card are two variables that might be
helpful in predicting whether a customer who receives
the catalog will use the coupon to make a $200
purchase.

1.Adapted from Anderson, Sweeney, and Williams purely for classroom discussion
Simmons Catalogue-Continues
Simmons conducted a study by sending out 100
catalogs, 50 to customers who have a Simmons credit
card and 50 to customers who do not have the card.
At the end of the test period, Simmons noted for each of
the 100 customers:
1) the amount the customer spent last year at Simmons,
2) whether the customer had a Simmons credit card, and
3) whether the customer made a $200 purchase.
The data file that contains the information is in
Logit-Simmons.csv

Develop a logistic regression model, obtain the output


and interpret the results
Example Problem-Books By Mail from Paul Green purely for
Classroom Discussion

• Books By Mail company is interested in offering a new title called The Art History of
Florence to 1000, existing customers. Of these, 83 actually purchased the book, a
response rate of 8.3 percent. Hence, the company sent a test mailing to them in
this regard. The company also sent out an identical mailing to another 1000
customers to serve as holdout sample. The scope of the study primarily confined to
predicting whether a customer will buy the new book or not is based on two input
variables namely months since last purchase and number of art books purchased. The
data files of the existing customers and the holdout sample are given in
“PaulBooks1.csv” and “Paulbooks2.csv” respectively.
Any Practical Value for Books By Mail?

We can assess the operational significance of the model by


using it to determine a mailing strategy for the 1000
customers in the holdout sample and then assessing the
profitability of the strategy. The cost of mailing an offer to
purchase The Art History of Florence is $1; if the customer
responds and purchases the book, then the net profit(after
the cost of mailing) is $6. What should be the mailing
strategy?
Fisher’s Linear Discriminant Analysis
A Multidimensional Perspective

Presentation
Dr. P.K.Viswanathan
Professor(Analytics)
What is Discriminant Analysis?

▪ Do heavy users and light users of our product differ in some


other characteristics?
▪ Is income a good predictor of this user status?
▪ Does amount of formal education discriminate between
viewers and non-viewers ?

The objective of discriminant analysis is to use the information from


the predictor variables to achieve the clearest possible separation or
discrimination among groups.
The Three Key Goals of Discriminant Analysis

1. Profiling

2. Differentiation

3. Classification
Applications of LDA
▪ In a textile mill, cotton quality depends on the chemical characteristics. LDA
can create the score required. If the score is more than a threshold value
(Cut off Point), Cotton quality is Good else Bad.

▪ In sanctioning loan for a customer, a bank uses a number of financial


indicators. The LDA score developed based on these indicators can be used
as a classifier. If the score is more than a threshold value, classify the
customer as a defaulter, else non-defaulter.

▪ For MBA admission in a business school based on past scholastic record


(Grade Point Average), GMAT score, and performance score in the
interview, whether to admit a candidate or not can be based on
discriminant score.

▪ The discussion here will be confined to two groups only as most of the
applications involve a dichotomous situation. However, LDA can easily
handle multiple classes.
Math Behind LDA

𝑍 = 𝑎1 𝑥1 + 𝑎2 𝑥2 +. . . . . . +𝑎𝑘 𝑥𝑘
𝑍1 = 𝑎1 𝑥1 (𝐼) + 𝑎2 𝑥2(𝐼) +. . . . +𝑎𝑘 𝑥𝑘(𝐼)
𝑍2 = 𝑎1 𝑥1 (𝐼𝐼) + 𝑎2 𝑥2(𝐼𝐼) +. . . . +𝑎𝑘 𝑥𝑘(𝐼𝐼)
𝑍1 − 𝑍2 = |a1D1+a2D2+a3D3+…..+akDk| =|aD|

Where D1, D2, ..Dk are the difference in means between the two
groups for predictor variables x1, x2, …., xk respectively. The values of
a1,a2,…ak will be so chosen as to

Maximize 𝑍1 − 𝑍2
subject to the constraint Var(Z)=1
Data Set
The seminal paper of Altman[1] classified and predicted corporate bankruptcy
based on a set of financial ratios. Z score of Fisher’s linear discriminant analysis
was employed to classify the firm into either “Bankrupt” or “Solvent”. The data
used in the study were from manufacturing corporations. The data set has 33
bankrupt firms and 33 solvent firms. The central goal was whether the bankrupt
firms and solvent firms could be sharply differentiated (separated) in terms of five
financial ratios. They are Working Capital/Total Assets(WCTA), Retained
Earnings/Total Assets(RETA), Earnings Before Interest and Taxes/Total
Assets(EBITTA), Market Value of Equity/Book Value of Total Debt(MVEBVTD), and
Sales/Total Assets(SATA). Original data set has been obtained from Morriosn’s Book
on “Multivariate Statistical Analysis”.

[The abbreviations within brackets are made for ease of identifying the ratios].
Brief Description of the Ratios

▪ WCTA: The Working Capital/Total Assets ratio, frequently found in studies of


corporate problems, is a measure of the net liquid assets of the firm relative to the
total capitalization. Ordinarily, a firm experiencing consistent operating losses will
have shrinking current assets in relation to total assets.

▪ RETA: Retained Earnings/Total Assets is a measure of cumulative profitability over


time. The age of a firm is implicitly considered in this ratio. For example, a
relatively young firm will probably show a low RETA ratio and is more vulnerable to
becoming “Bankrupt” compared to well established older firms who would have
got substantial accumulated earnings.
Brief Description of the Ratios

▪ EBITTA: This ratio is calculated by dividing the total assets of a firm into its earnings
before interest and tax reductions. Since a firm's ultimate existence is based on the
earning power of its assets, this ratio appears to be particularly appropriate for
studies dealing with corporate failure

▪ MVEBVTD: This ratio measure shows how much the firm's assets can decline in
value (measured by market value of equity plus debt) before the liabilities exceed
the assets and the firm becomes insolvent. It also appears to be a more effective
predictor of bankruptcy than the more commonly used ratio: Net worth/Total debt
Brief Description of the Ratios

▪ SATA: The capital-turnover ratio is a standard financial ratio illustrating the sales
generating ability of the firm's assets. It is one measure of management's capability
in dealing with competitive conditions. This final ratio is quite important because of
its unique relationship to other variables in the model. Statistically speaking,
perhaps, this ratio would appear to be least significant in discriminating power.
Profiling-Descriptive

Group Means
WCTA RETA EBITTA MVEBVTD SATA
Bankrupt -6.05 -62.51 -31.78 40.05 1.50
Solvent 41.38 35.25 15.32 254.67 1.94
Differentiation-Visual-WCTA
Differentiation-Visual-RETA
Differentiation-Visual-EBITTA
Differentiation-Visual-MVEBVTD
Differentiation-Visual-SATA
LDA -Z score Equation

Hyperplane Equation(Z score)

Z=0.0153WCTA+0.0183RETA+0.0418EBITTA+0.0077MVEBVTD+1.2543SATA

Cut off Point Score=2.9714

If the Score is >=2.9714, Predict “Solvent”


If the Score is < 2.9714, Predict “Bankrupt
Confusion Matrix

Accuracy(95.45%)
Correlation between DA( Z Scores) and
Input Variables in absolute terms)

Input DA Rank
Variable DA
WCTA 0.7304 3
RETA 0.8702 1
EBITTA 0.6809 4
MVEBVTD 0.7352 2
SATA 0.2589 5
ROC Curve
Discriminant Analysis

• LDA was discovered by Ronald Fisher


• The three Goals of LDA are Profiling, Differentiation, and Classification
• It has wide applications and easy to implement.
• It is capable of multiclass
• t= sqrt((r sq/(1-r sq))*sqrt(n-2),
• follows a t distribution with n-2 df.
Support Vector Machines-A Conceptual
Framework
Dr. P.K.Viswanathan
Professor Analytics
What is SVM?
▪ Support Vector Machines(SVM) which was originally
invented in 1963 by Vladimir N. Vapnik has the goal of
achieving the largest separation between classes.
▪ Support Vector Machines are based on the concept of
hyperplanes that define decision boundaries.
▪ A hyperplane is one that separates the objects of one
class from another.
Recruitment Example

X2

X1 = Hire the Candidate


= Do not Hire the Candidate

X1 is the Aptitude Test Score


X2 is the Interview Performance Score
Linearly separable data points

• Data that can be separated by a


line (or in general, a hyperplane)
is known as linearly separable
data
• The hyperplane acts as a linear
classifier.
Good vs Bad Separator?

 ✓
SVM Scenario
▪ Find lines that correctly classify the
training data
▪ Among all such lines, pick the one that
has the greatest distance to the points
closest to it (margin).
▪ The closest points that identify this line
are known as support vectors.
▪ region they define around the line is
known as the margin.
Sec. 15.1

Math Behind SVM

wTxa + b = 1
d
wTxb + b = -1
• Hyperplane
wT x + b = 0

• Extra scale constraint:


mini=1,…,n |wTxi + b| = 1

• This implies:
wT(xa–xb) = 2
d = ||xa–xb||= 2/||w||
wT x + b = 0
Math Behind SVM
• We can we can formulate the quadratic optimization problem:

Find w and b such that


2
d= is maximized; and for all {(xi , yi)}
w
wTxi + b ≥ 1 if yi=1; wTxi + b ≤ -1 if yi = -1

• A better formulation

Find w and b such that


Φ(w) =½ wTw is minimized;

and for all {(xi ,yi)}: yi (wTxi + b) ≥ 1


Math Behind SVM Simplified

Minimize (𝑊12 + 𝑊22 + 𝑊32 +……..+Wk2)/2

Subject to the constraints

Y1(W1X1+W2X12+W3X3…..+WkXk+b)>=1
Y2(W1X1+W2X2+W3X3…..+WkXk+b)>=1
…………………………
………………………….

Yn(W1X1+W2X2+W3X3…..+WkXk+b)>=1
Learning SVM–Classroom Exercise-Walk the Talk
*The file DiscriWinstonFR.csv contains information on the following items about
24 companies: EBITASS(Earnings before Income and Taxes, divided by Total
Assets), ROTC(Return on Total Capital), and Group(1 for “Most Admired” and 2
for “Least Admired” Companies).

1) Apply SVM to classify a company as a most admired or least admired


company.
2) Draw the Hyperplane separating the data points with
(Wx+b=0, Wx+b=-1,Wx+b=+1)

*Problem adapted from “Management Science Modeling” by Winston and Albright


purely for Classroom Discussion
Hyperplane Separating Data Points
0.3

0.25

0.2

0.15

0.1
ROTC

Wx+b=0
0.05

0
-0.1 0 0.1 0.2 0.3
-0.05

-0.1

-0.15
EBITTAS
Hyperplane Separating the Data Points
With Decision Boundaries
0.3 Hyperplane Equation is 24.97X1+ 91.98X2 -14.49 0.3

0.25 0.25

0.2 0.2

0.15 0.15

0.1 0.1

0.05 0.05

0 0
-0.1 -0.05 0 0.05 0.1 0.15 0.2 0.25 0.3
-0.05 -0.05

-0.1 -0.1

-0.15 -0.15
Wx+b=0 Wx+b=-1 Wx+b=+1 ROTC
Confusion Matrix

Predicted
Actual Most Admired Least Admired
Most Admired 12 0
Least Admired 0 12
Acuracy 100.00%
Dataset with noise

denotes +1 ◼ Hard Margin: So far we require all


data points be classified correctly
denotes -1
- No training error
◼ What if the training set is noisy?
- Solution 1: use very powerful
kernels

OVERFITTING!
Hard Margin v.s. Soft Margin
◼ The old formulation:
Find w and b such that
Φ(w) =½ wTw is minimized and for all {(xi ,yi)}
yi (wTxi + b) ≥ 1

◼ The new formulation incorporating slack variables:

Find w and b such that


Φ(w) =½ wTw + CΣξi is minimized and for all {(xi ,yi)}
yi (wTxi + b) ≥ 1- ξi and ξi ≥ 0 for all i

◼ Parameter C can be viewed as a way to control overfitting.


Non-linear SVMs: Feature spaces
◼ General idea: the original input space can always be mapped
to some higher-dimensional feature space where the training
set is separable:

Φ: x → φ(x)
The “Kernel Trick”
◼ The linear classifier relies on dot product between vectors K(xi,xj)=xiTxj
◼ If every data point is mapped into high-dimensional space via some
transformation Φ: x → φ(x), the dot product becomes:
K(xi,xj)= φ(xi) Tφ(xj)
◼ A kernel function is some function that corresponds to an inner product in
some expanded feature space.
◼ Example:
2-dimensional vectors x=[x1 x2]; let K(xi,xj)=(1 + xiTxj)2,
Need to show that K(xi,xj)= φ(xi) Tφ(xj):
K(xi,xj)=(1 + xiTxj)2,
= 1+ xi12xj12 + 2 xi1xj1 xi2xj2+ xi22xj22 + 2xi1xj1 + 2xi2xj2
= [1 xi12 √2 xi1xi2 xi22 √2xi1 √2xi2]T [1 xj12 √2 xj1xj2 xj22 √2xj1 √2xj2]
= φ(xi) Tφ(xj), where φ(x) = [1 x12 √2 x1x2 x22 √2x1 √2x2]
Examples of Kernel Functions
◼ Linear: K(xi,xj)= xi Txj

◼ Polynomial of power p: K(xi,xj)= (1+ xi Txj)p

◼ Gaussian (radial-basis function network):


2
xi − x j
K (x i , x j ) = exp(− )
2 2

◼ Sigmoid: K(xi,xj)= tanh(β0xi Txj + β1)


Properties of SVM
• Flexibility in choosing a similarity function
• Sparseness of solution when dealing with large data sets
• Ability to handle large feature spaces
• Nice math property: a simple convex optimization problem
which is guaranteed to converge to a single global solution
• Feature Selection
SVM Applications

▪ text (and hypertext) categorization


▪ image classification
▪ Cancer classification
▪ hand-written character recognition
Decision Tree and Random Forest

Presentation
Dr. P.K.Viswanathan
Professor(Analytics)
Components of a decision tree

Root node

Internal node Internal node


/ Decision / Decision
node node

Leaf node / Leaf node / Leaf node / Leaf node /


terminal terminal terminal terminal
node node node node
Generator Example

• An electric generator manufacturer would like to find a way of classifying


businesses in a city into those likely to purchase a generator and those not likely to
buy one.

• Data - 12 owners and 12 nonowners of generator in the city, their electricity


consumption and the business income

3
Dataset

Income Consumption Ownership


60 18.4 Owner Classify those likely to
85.5 16.8 Owner
64.8 21.6 Owner purchase using the Income
61.5 20.8 Owner and electricity
87 23.6 Owner
110.1 19.2 Owner consumption
108 17.6 Owner
82.8 22.4 Owner
69 20 Owner
93 20.8 Owner
51 22 Owner
81 20 Owner
75 19.6 Nonowner
52.8 20.8 Nonowner
64.8 17.2 Nonowner
43.2 20.4 Nonowner
84 17.6 Nonowner
49.2 17.6 Nonowner
59.4 16 Nonowner
66 18.4 Nonowner
47.4 16.4 Nonowner
33 18.8 Nonowner
51 14 Nonowner
63 14.8 Nonowner

4
How decision tree works – step1

Scatterplot of Income procedure will choose Income for the


vs Consumption – 1 first split with a splitting value of 60.
owners vs non-owners

split creates 2 rectangles, each is more


The left rectangle
contains points
homogeneous than the rectangle before
that are mostly the split.
nonowners (seven
nonowners and Basis of split
one owner) right rectangle contains Split points are ranked according to how much they
mostly owners (11 owners reduce impurity (heterogeneity) in the resulting
and five nonowners). rectangle.
A pure rectangle is one that is composed of a single
class (e.g., owners).
5
How decision tree works – step2

Pure node Pure node of 7


of 1 owner owners

Final stage of recursive


Next split is on partitioning –
consumption of 21
each rectangle
consisting of a single
Pure node of class (owners or
7 non owners nonowners)

Splitting the 24 records first by


income value of 60

Sowmya Vivek
6
Measures of impurity

• Gini Index
• Entropy measure

7
1. Calculate GINI for overall rectangle

2 2
12 12
1− −  𝟎. 𝟓
24 24

8
2. Calculation of GINI Index for left and right
rectangles
GINI index for the left & right

2 2
2 2 11 5
7 1 1− −  0.43
1− −  𝟎. 𝟐𝟏𝟗 16 16
8 8

9
3. weighted average of impurity measures

GINI index for the left & right

2 2
2 2 11 5
7 1 1− −  0.43
1− −  𝟎. 𝟐𝟏𝟗 16 16
8 8

8 16
× 0.219 + × 0.43 = 0.359
24 24

10
GINI Index before & after the split

Gini Index of original rectangle

0.5 0.359

Gini index after split


11
Steps in calculating GINI Index

Combined impurity
Calculate GINI for Calculate GINI for Calculate GINI for of left + right –
overall rectangle left rectangle right rectangle weighted average of
impurity measures

12
1. Calculate Entropy for overall rectangle

12 12 12 12
− × 𝑙𝑜𝑔2 +− × 𝑙𝑜𝑔2 =1
24 24 24 24

13
2. Calculation of entropy for left and right
rectangles
GINI index for the left & right

7 7 1 1
− × 𝑙𝑜𝑔2 +− × 𝑙𝑜𝑔2 = 0.54
8 8 8 8

11 11 5 5
− × 𝑙𝑜𝑔2 +− × 𝑙𝑜𝑔2 = 0.89
16 16 16 16

14
3. weighted average of entropy

entropyfor the left & right


7 7 1 1 11 11 5 5
− × 𝑙𝑜𝑔2 +− × 𝑙𝑜𝑔2 = 0.54 − × 𝑙𝑜𝑔2 +− × 𝑙𝑜𝑔2 = 0.89
8 8 8 8 16 16 16 16

8 16
× 0.54 + × 0.89 = 0.779
24 24

15
Entropy before & after the split

Entropy of original rectangle

1 0.779

Entropy after split


16
Information Gain

The entropy typically


changes when we use a
Information gain is a
node in a decision tree to
measure of this change in
partition the training
entropy
instances into smaller
subsets
Parameters in decision tree learning

Choosing the Binary or Finding the right


splitting criterion multiway splits sized tree
• Impurity based • Multiway split • Pre-pruning
criteria • Binary split • Post-pruning
• Information
gain

15-Mar-22 18
Random Forests
Ensemble methods

• A single decision tree does not perform well

• But, it is super fast

• What if we learn multiple trees?

We need to make sure they do not all just learn the same
Random Forest
• This is a widely used ensemble technique in view of its superior performance
and scalability.

• It is an ensemble of decision trees, where each decision tree is built from


bootstrap samples (K from N with replacement) and randomly selected
subset of features(m out of p) without replacement. The decision trees are
normally grown deep (without pruning).

• The hyperparameters that will be tuned to increase the model accuracy in a


Random Forest model are
1. Number of decision trees.
2. Number of records and features to be sampled.
3. Depth and search criteria (Gini impurity index or entropy).

You might also like