You are on page 1of 21

Decision science

September 2022 Examination

Q1. Avantika Mattoo working as an analyst in reputed pharmaceutical company


wants to invest her money in stocks. Her friends having expertise in stock market
investments, suggested her to invest money into ‘Reliance’ and ‘Maruti’ shares.
Avantika’s economist on Avantika’s investments. The figures of return on
investment as per four different scenarios presented in the table below.

Payoff
(profit
within one
month on one
unit of share
in INR)
Scenario 1 Scenario 2 Scenario 3 Scenario 4

Reliance
Industry ltd. 55 43 29 15

Maruti 26 38 43 51

1) Set up the opportunity loss table.

2) Draw the decision tree (Note: You may use any software for making tree
diagram, but snapshot of handwritten tree will be unacceptable)

3) According to Niharika’s latest research, she has assigned the following


probabilities to
the four scenarios (states of nature), determine the EMV decision; P(s1) = 0.4 P(s2) =
0.1 P(s3) = 0.3 P(s4) = 0.2.

(Note: Do mention the decision based on the analysis clearly in few sentences.)

(10 Marks)

Ans 1.
Introduction:
The gap between the maximum conceivable revenue for a state of nature and the exact
revenue realized for the specific action chosen is characterized as an opportunity loss. In a
nutshell, opportunity loss is the loss suffered as a result of failing to take the greatest
potential course of action or plan. Opportunity losses are computed independently for each
possible condition of nature. Therefore, for any state of nature, compute opportunity loss for
the specified plan of events and determine the differential amongst the maximum payoff and
payout for every plan of action for that state of nature. The conditional opportunity loss is
also known as the opportunity loss for every plan of action. The expected monetary value
(EMV) for every plan of events can be calculated using the rewards and possibilities of
various natural occurrences.

Source: https://qsstudy.com/wp-content/uploads/2018/07/Opportunity-Cost-3.jpg

A decision tree is a structured diagram like a graph that illustrates any potential outcome for
a given input using a branching mechanism. Decision trees can be hand-drawn or generated
using a graphics application or specialist software. Whenever a panel wants to evaluate and
make a judgement from various alternatives, decision trees can help concentrate onh the
debate. It could be used to portray conclusions and decision making graphically and
unambiguously. It employs a decision-tree-like model, asit suggests in the name itself. A
decision tree diagram is a form of flowchart that lays down the following routes of events
available to simplify decision-making. Decision trees also depict the various results of each
plan of events. Decision trees are fantastic techniques for assisting you in deciding amongst
multiple options. They give an efficacious structure for laying out choices and investigating
the potential results of those alternatives. Decision trees are built by iteratively assessing
distinctive aspects and using the trait that optimally separates the information at every node.
Source: https://image.shutterstock.com/image-vector/decision-tree-icon-data-analysis-
260nw-1215849010.jpg

Concept and Analysis:

The EMV for a specific course of action is the weighted average pay off, which is the
aggregate of the pay off for each permutation of paths and states of nature multiplied by the
likelihood of an event occurrence of each result. The expected monetary value (EMV) for
every plan of events can be calculated using the rewards and possibilities of various natural
occurrences. The EMV for a specific course of action is the weighted average pay off, which
is the aggregate of the pay off for each permutation of paths and states of nature multiplied
by the likelihood of an event occurrence of each result.
Formula:
Expected Monetary Value (EMV) = Probability * Impact

Source: https://encrypted-tbn0.gstatic.com/images?
q=tbn:ANd9GcTrq5hJP36_ZTkGdrUeRmoMFbbNhRz_V44-pA&usqp=CAU

Opportunity loss is the reduction suffered as a result of failing to choose the greatest
potential plan of events or technique. This choice category depends on the loss of an
opportunity or regret. The sum of money wasted due to not selecting the greatest option in a
particular condition of nature is referred to as opportunity loss or regret. The minimax regret
criterion uses the opportunity loss table to discover the option that minimizes the maximum
opportunity loss in each alternatives.
In a decision tree analysis, expected monetary value analysis makes things simpler to
estimate risks, estimate the contingency reserve, and find the optimum option. During this
procedure, the risk attitude should be neutral; else, the estimation may worsen. Furthermore,
the accuracy of this analysis is dependent on the input data. A complete data quality
evaluation should be carried out. This strategy boosts confidence in meeting the
organization's goal.
In projects, the Expected Monetary Value can be used to make comparisons of risks. It
computes the average outcome whenever the forecast contains unpredictable possibilities or
occurrencies, which can be either favourable (possibilities) or unfavorable (risks) (threats).
Risks are represented as negative numbers, while opportunities are represented as positive
numbers.

1. Opportunity Loss Table:

Payoff (profit
within one
month on one
unit of share
in INR Scenario 1 Scenario 2 Scenario 3 Scenario 4

Reliance
Industry ltd. 55 43 29 15

Maruti 26 38 43 51
Payoff (profit
within one
month on one
unit of share
in INR Scenario 1 Scenario 2 Scenario 3 Scenario 4

Reliance
Industry ltd. (55-55) (43-43) (43-29) (51-15)

Maruti (55-26) (43-38) (43-43) (51-51)

Opportunity
Loss Table

Payoff (profit
within one
month on one
unit of share
in INR Scenario 1 Scenario 2 Scenario 3 Scenario 4

Reliance
Industry ltd. 0 0 14 36

Maruti 29 5 0 0

2. Decision Tree: Learning is supervised learning approach used in statistics, data mining
and machine learning. In this formalism, a classification or regression decision tree is
used as a predictive model to draw conclusions about a set of observations. Tree models
where the target variable can take a discrete set of values are called classification trees; in
these tree structures, leaves represent class labels and branches represent conjunctions of
features that lead to those class labels. Decision trees where the target variable can take
continuous values (typically real numbers) are called regression trees. Decision trees are
among the most popular machine learning algorithms given their intelligibility and
simplicity. In decision analysis, a decision tree can be used to visually and explicitly
represent decisions and decision making. In data mining, a decision tree describes data
(but the resulting classification tree can be an input for decision making).

A tree has many analogies in real life, and turns out that it has influenced a wide area of
machine learning, covering both classification and regression. In decision analysis, a
decision tree can be used to visually and explicitly represent decisions and decision making.
As the name goes, it uses a tree-like model of decisions.

A decision tree is drawn upside down with its root at the top. In the image on the left, the
bold text in black represents a condition/internal node, based on which the tree splits into
branches/ edges. The end of the branch that doesn’t split anymore is the decision/leaf, in this
case, whether the passenger died or survived, represented as red and green text respectively.

Although, a real dataset will have a lot more features and this will just be a branch in a much
bigger tree, but you can’t ignore the simplicity of this algorithm. The feature importance is
clear and relations can be viewed easily. This methodology is more commonly known as
learning decision tree from data and above tree is called Classification tree as the target is to
classify passenger as survived or died. Regression trees are represented in the same manner,
just they predict continuous values like price of a house. In general, Decision Tree
algorithms are referred to as CART or Classification and Regression Trees.

So, what is actually going on in the background? Growing a tree involves deciding on which
features to choose and what conditions to use for splitting, along with knowing when to stop.
As a tree generally grows arbitrarily, you will need to trim it down for it to look beautiful.
Lets start with a common technique used for splitting.
3. The four probabilities (states of nature) assigned from Niharika’s research are as
follows: P(s1) = 0.4 P(s2) = 0.1 P(s3) = 0.3 P(s4) = 0.2.

Expected monetary value for the decision alternative for Reliance Industry ltd.:

EMV for scenario 1= 55(0.4) = 22


EMV for scenario 2= 43(0.1) = 4.3
EMV for scenario 3= 29(0.3) = 8.7
EMV for scenario 4=15(0.2) = 3
The expected monetary value for investing in Reliance Industry Ltd. is Rs.38

Expected monetary value for decision alternative for Maruti:

EMV for scenario 1= 26 (0.4)= 10.4


EMV for scenario 2= 38(0.1)= 3.8
EMV for scenario 3= 43(0.3)= 12.9
EMV for scenario 4= 51(0.2)= 10.2

The expected monetary value for investing in Reliance Industry Ltd. is Rs.37.3

A decision maker using expected monetary value will choose the maximum out of
expected monetary value for the decision:

Maximum of {38, 37.3} = 38

Conclusion:

Hence, we can conclude that the expected monetary value i.e. 38, and the decision would be
Reliance Industries Ltd.
Q2. From the following data, check the correlation of ‘Migrants person’ (Migration
form Urban areas of J & K to another urban areas of J &K) with the below given
variables. Write your conclusion with respect to the correlation coefficient and Scatter
Diagram
Draw the scatter plot (you may use EXCEL, SPSS, Python, R etc.)

Perform the correlation for the following pairs of variables


Migrant person numbers V/s ‘Number of Factory/Workshop/Work shed etc.’
Migrant person number V/s ‘Number of commercial establishments’
Migrant person number V/s ‘Number of towns’
Migrant person number V/s ‘Population per sq. km.’

(Note: no need to calculate correlation coefficient manually, use EXCEL formula, or


any other software)

(Note regarding the data access: You can copy the data from this pdf document and
paste it into your EXCEL workbook; you may have to work on alignment if it is
distorted in Excel.)

Data for the analysis

Districts Migrant Number of Number of Number of Population


person Factory/ commercial towns per sq. km.
numbers Workshop/ establishments
Workshed
etc.

Kupwara 2667 188 6571 10 2212


Badgam 5370 273 5552 9 1996
Leh(Ladakh) 2621 176 3898 3 1902
Kargil 650 61 1711 1 7635
Punch 2038 67 3605 3 1604
Rajouri 4011 236 4656 4 2390
Kathua 11306 504 6952 6 2079
Baramula 19382 663 12171 7 2871
Bandipore 4866 156 3258 3 1317
Srinagar 94844 3575 47986 5 4141
Ganderbal 3041 231 2418 3 1852
Pulwama 7939 257 4885 5 2087
Shupiyan 1700 30 1673 1 3007
Anantnag 13545 919 14079 12 2880
Kulgam 2621 243 3902 7 1619
Doda 2297 76 3016 2 1655
Ramban 1171 41 1768 3 783
Kishtwar 569 45 1765 1 23595
Udhampur 10873 549 6795 6 2475
Reasi 2085 28 3677 5 692
Jammu 139422 2410 46539 20 3034
Samba 5349 685 5162 6 1383

Ans 2.

Introduction:

Correlation coefficients are being used to assess the strength of an association between two
variables. The correlation coefficient is a statistical term that aids in the establishment of a
relationship among predicted and actual values acquired in a statistical sum. The estimated
correlation coefficient shows the closeness of the expected and actual values.
A correlation value of 1 indicates that for every positive rise in one variable, there is a fixed
corresponding increment in the other. A correlation coefficient of -1 shows that for every
positive rise in one variable, there is a fixed proportional reduction in the other. For example,
the amount of gas in a tank reduces in (nearly) perfect proportion to speed. Zero indicates that
there is no positive or negative rise for every increase. They are not connected.

Source: https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcQPHJqexQb2-
D9mmV6B16DEgz9jtbWrIpWBbQ&usqp=CAU

A scatter diagram depicts the association amongst two variables. A scatter diagram is among
the most effective tools for displaying a non-linear pattern. A scatter diagram gives evidence
to support a claim that two variables are connected. It is basically a two-dimensional
rectangular coordinate graph with points representing the values of the two variables being
studied.

Source: https://encrypted-tbn0.gstatic.com/images?
q=tbn:ANd9GcRS3A61Ddjl2ruPvTWBIQtg6G52ytCl9pWxxg&usqp=CAU

Concept and Application:

Correlation coefficients quantify the degree of similarity amongst two variables. A variable
correlation shows that when one variable's value changes, the other variable appears to shift
in a number of directions. Understanding that relationship is beneficial because we may use
one variable's value to anticipate the value of the other variable. Correlation coefficients are a
quantitative measure of the direction and strength of this propensity to change
simultaneously. Correlation is a statistical concept that describes how two variables move in
tandem with each other. It is a numerical statistical metric that reflects the magnitude and
direction of the force amongst two or more variables. When two or more variables move in
the same direction, they are said to be positively correlated. They have a negative correlation
if they travel in opposite directions. The goal of using correlations in research is to determine
which variables are linked.
Pearson's correlation coefficient is calculated by taking the covariance of two variables and
dividing it by the sum of their standard deviations. It is commonly symbolised by (rho). cov
(X,Y) / X = (X,Y).
Scatter Diagrams are useful mathematical tools for investigating the relationship amongst the
two random variables. They take the form of a sheet of paper with data points pertaining to
the variables of interest spread over it. Scatterplots are an excellent technique to immediately
test for correlation between two sets of continuous variables.
Source: https://editor.analyticsvidhya.com/uploads/39170Formula.JPG

1. Using Excel, we can determine the correlation coefficient and obtain the following
data:

Number of Number of
Migrant factory/Work commercial
person shop/Worksh establishment Number of
Districts numbers ed etc. s towns

Kupwara 2667 188 6571 10

Badgan 5370 273 5552 9

Leh(ladakh) 2621 176 3898 3

Kargil 650 61 1711 1

Punch 2038 67 3605 3

Rajouri 4011 236 4656 4

Kathua 11306 504 6952 6

Baramula 19382 663 12171 7

Bandipore 4866 156 3258 3

Srinagar 94844 3575 47986 5


Ganderbal 3041 231 2418 3

Pulwama 7939 257 4885 5

Shupiyan 1700 30 1673 1

Anantnag 13545 919 14079 12

Kulgam 2621 243 3902 7

Doda 2297 76 3016 2

Ramban 1171 41 1768 3

Kishtwar 569 45 1765 1

Udhampur 10873 549 6795 6

Reasi 2085 28 3677 5

Jammu 139422 2410 46539 20

Samba 5349 685 5162 6

Migrant
person
numbers V/s
‘Number of
Factory/Work
shop/Work
shed etc.’ 0.8973328316

Migrant
person
number V/s
‘Number of
commercial
establishments
’ 0.9665746707
Migrant
person
number V/s
‘Number of
towns’ 0.6629394793

Migrant
person
number V/s
‘Population -
per sq. km.’ 0.0179593907

Interpretation:

Migrant person numbers V/s ‘Number of Factory/Workshop/Work shed etc.= 0.8973328316,


It denotes a perfect positive correlation coefficient between the two variables.

Migrant person number V/s ‘Number of commercial establishments’= 0.9665746707, It


denotes a perfect positive correlation coefficient between the two variables.

Migrant person number V/s ‘Number of towns’= 0.6629394793, It denotes a perfect


positive correlation coefficient between the two variables.

Migrant person number V/s ‘Population per sq. km.’=-0.0179593907, It denotes a negative
correlation coefficient which indicates an inverse relationship between both the two variables.

2. Using excel to plot the scattered diagram of all the 4 correlation coefficients, we get
the following graph:
a. Scattered diagram for Migrant person numbers V/s ‘Number of
Factory/Workshop/Work shed etc.

b. Scattered diagram for Migrant person number V/s ‘Number of commercial


establishments’

c. Scattered diagram for Migrant person number V/s ‘Number of towns’


d. Scattered diagram for Migrant person number V/s ‘Population per sq. km.’

Conclusion:

Hence, we can conclude that a scatterplot displays the strength, direction, and form of the
relationship between two quantitative variables. A correlation coefficient measures the
strength of that relationship. The correlation coefficient and scattered diagram of all the four
variables are given above with their interpretation.
Q3a. The number of customers who enter a ‘German’ supermarket-Gandhinagar
each hour is ‘normally’ distributed with a mean of 600 and a standard deviation of
200. The supermarket is open 16 hours per day. What is the probability that the total
number of customers who enter the supermarket in one day is greater than 10,000?
(Note: Show the stepwise calculation and write the interpretation based on the final
answer) (5 Marks)

Ans. 3(a)

Introduction:

Probability is synonymous with possibility. It also refers to a mathematical branch that deals
with the occurrence of a random event. The value ranges from zero to one. It basically
indicates the likelihood of anything happening. It expresses the possibility of an event
occurring. The probability of all events in a sample space equals to the value one. A
probability technique can be used to calculate the likelihood of an occurrence by simply
dividing the favourable number of possibilities by the entire number of possible outcomes.
Since there can never be more good outcomes than there are possible outcomes, the chance of
an event occurring can range from 0 to 1. Furthermore, the positive number of outcomes
cannot be negative.

Source:
https://t4.ftcdn.net/jpg/00/73/62/39/360_F_73623950_JCrrWLXvdZJ0Gx56XAL3luNcaza25
FOF.jpg

The formula for probability:


Probability of event to happen P(E) = Number of favorable outcomes/Total Number of
outcomes.

Concept and Analysis:

The number of favorable outcomes to all possible outcomes of an event is the proportion,
which is known as probability. It measures how probable something is to occur. Probability is
a field of mathematics that focuses with estimating the likelihood of occurrence of a
particular event. The Z-score measures how far a given value deviates from the standard
deviation. The Z-score, also known as the standard score, is the number of standard
deviations above or below the mean for a given data set. The standard deviation reflects the
degree of variance within a particular data collection.

Given:
mean=600
Standard deviation = 200
Supermarket is open 16hours per day

Formula:

X̄ = ( Σ xi ) / n

X̄ = 10000/16 = 625

Formula:

z = (x̄ – μ) / σ

where x represents the raw score, μ is the mean of population, and σ represents the population
standard deviation.

z = (625 - 600)/200 = 0.125


P(X̄ > 625) = P(z > 0.125) = 0.4503

Conclusion:

We can conclude that the probability that the total number of customers who enter the
supermarket ‘German’ supermarket-Gandhinagar in one day is greater than 10,000 is 0.4503.

Q 3b. Shree Ganga Taploo University bookstore claims that 50% of its customers are
satisfied with the service and prices.

If this claim is true, what is the probability that in a random sample of 600 customers
less than 45% are satisfied with services and price?

(Note: Show the stepwise calculation and write the interpretation based on the final
answer) (5 Marks)

Ans 3b.

Introduction:

The term "probability" refers to the likelihood of a specific event (or group of events)
occurring, represented on a linear scale from 0 (impossibility) to 1 (certainty), as well as as a
percentage between 0 and 100 percent. Statistics is the study of occurrences guided by
probability. The more common understanding of probability is that it is merely a measure of
the occurrence of outcomes, but Bayesians considers probability more highly subjective as a
statistical technique that attempts to estimate parameters of a statistical properties derived
from empirical distributions. When measured in standard deviation units, a z-score represents
the location of a raw score in respect of its proximity from the mean. The z-score can be
concluded as positive if the value is more than the mean and negative if the value is less than
the mean. It is also known as a standard score since normalising the distribution enables for
comparison of results on other types of factors.
Source: https://i2.wp.com/www.learncbse.in/wp-content/uploads/2019/04/Probability-
Formulas.png?fit=560%2C315&ssl=1

Concept and Analysis:

The likelihood of an occurrence is the proportion or ratio of the number of cases favourable
to it to the total number of cases feasible while nothing tells us to believe any of these
circumstances to happen more than any other, making them simultaneously feasible for us.
The Z test is a statistical test performed on data that roughly matches a normal distribution.
For hypothesis testing, the z test can be applied to one sample, two samples, or proportions.
Whenever the population variance is known, it determines whether or not the means of two
big samples vary. A z testing is used to assess if the means of two populations vary or not,
assuming the data follows a normal distribution.
Based on the data given above, we have:

Sample size = 600 customers, n= 600


Probability of success= probability of satisfied customers = 50%,
So P= 0.5
Probability of failure= 1-P= Q= 0.5

Mean(X̄) = number of success needed = less than 45% of 600


= 45/100 ×600=270

= P(X<270)
= P(p<270/600)
=P(p<0.45)

Converting to standard normal variable, we get here is :


z = (x-μ)/σ

where x represents the raw score, μ is the mean of population , and σ represents the
population standard deviation.

P(Z< 0.45−0.5
—------------
√ 0.5 ×0.5 /600

P(Z< -2.45)

Getting the data from the standard normal tables, we get is 0.0072

Conclusion:

From this, we can conclude that the probability that in a random sample of 600 customers is
less than 45% are satisfied with services and prices is 0.0072. Also, as 0.0072<0.5 , hence we
can say that the probability of that event happening is unusual and rare.

You might also like