Professional Documents
Culture Documents
Payoff
(profit
within one
month on one
unit of share
in INR)
Scenario 1 Scenario 2 Scenario 3 Scenario 4
Reliance
Industry ltd. 55 43 29 15
Maruti 26 38 43 51
2) Draw the decision tree (Note: You may use any software for making tree
diagram, but snapshot of handwritten tree will be unacceptable)
(Note: Do mention the decision based on the analysis clearly in few sentences.)
(10 Marks)
Ans 1.
Introduction:
The gap between the maximum conceivable revenue for a state of nature and the exact
revenue realized for the specific action chosen is characterized as an opportunity loss. In a
nutshell, opportunity loss is the loss suffered as a result of failing to take the greatest
potential course of action or plan. Opportunity losses are computed independently for each
possible condition of nature. Therefore, for any state of nature, compute opportunity loss for
the specified plan of events and determine the differential amongst the maximum payoff and
payout for every plan of action for that state of nature. The conditional opportunity loss is
also known as the opportunity loss for every plan of action. The expected monetary value
(EMV) for every plan of events can be calculated using the rewards and possibilities of
various natural occurrences.
Source: https://qsstudy.com/wp-content/uploads/2018/07/Opportunity-Cost-3.jpg
A decision tree is a structured diagram like a graph that illustrates any potential outcome for
a given input using a branching mechanism. Decision trees can be hand-drawn or generated
using a graphics application or specialist software. Whenever a panel wants to evaluate and
make a judgement from various alternatives, decision trees can help concentrate onh the
debate. It could be used to portray conclusions and decision making graphically and
unambiguously. It employs a decision-tree-like model, asit suggests in the name itself. A
decision tree diagram is a form of flowchart that lays down the following routes of events
available to simplify decision-making. Decision trees also depict the various results of each
plan of events. Decision trees are fantastic techniques for assisting you in deciding amongst
multiple options. They give an efficacious structure for laying out choices and investigating
the potential results of those alternatives. Decision trees are built by iteratively assessing
distinctive aspects and using the trait that optimally separates the information at every node.
Source: https://image.shutterstock.com/image-vector/decision-tree-icon-data-analysis-
260nw-1215849010.jpg
The EMV for a specific course of action is the weighted average pay off, which is the
aggregate of the pay off for each permutation of paths and states of nature multiplied by the
likelihood of an event occurrence of each result. The expected monetary value (EMV) for
every plan of events can be calculated using the rewards and possibilities of various natural
occurrences. The EMV for a specific course of action is the weighted average pay off, which
is the aggregate of the pay off for each permutation of paths and states of nature multiplied
by the likelihood of an event occurrence of each result.
Formula:
Expected Monetary Value (EMV) = Probability * Impact
Source: https://encrypted-tbn0.gstatic.com/images?
q=tbn:ANd9GcTrq5hJP36_ZTkGdrUeRmoMFbbNhRz_V44-pA&usqp=CAU
Opportunity loss is the reduction suffered as a result of failing to choose the greatest
potential plan of events or technique. This choice category depends on the loss of an
opportunity or regret. The sum of money wasted due to not selecting the greatest option in a
particular condition of nature is referred to as opportunity loss or regret. The minimax regret
criterion uses the opportunity loss table to discover the option that minimizes the maximum
opportunity loss in each alternatives.
In a decision tree analysis, expected monetary value analysis makes things simpler to
estimate risks, estimate the contingency reserve, and find the optimum option. During this
procedure, the risk attitude should be neutral; else, the estimation may worsen. Furthermore,
the accuracy of this analysis is dependent on the input data. A complete data quality
evaluation should be carried out. This strategy boosts confidence in meeting the
organization's goal.
In projects, the Expected Monetary Value can be used to make comparisons of risks. It
computes the average outcome whenever the forecast contains unpredictable possibilities or
occurrencies, which can be either favourable (possibilities) or unfavorable (risks) (threats).
Risks are represented as negative numbers, while opportunities are represented as positive
numbers.
Payoff (profit
within one
month on one
unit of share
in INR Scenario 1 Scenario 2 Scenario 3 Scenario 4
Reliance
Industry ltd. 55 43 29 15
Maruti 26 38 43 51
Payoff (profit
within one
month on one
unit of share
in INR Scenario 1 Scenario 2 Scenario 3 Scenario 4
Reliance
Industry ltd. (55-55) (43-43) (43-29) (51-15)
Opportunity
Loss Table
Payoff (profit
within one
month on one
unit of share
in INR Scenario 1 Scenario 2 Scenario 3 Scenario 4
Reliance
Industry ltd. 0 0 14 36
Maruti 29 5 0 0
2. Decision Tree: Learning is supervised learning approach used in statistics, data mining
and machine learning. In this formalism, a classification or regression decision tree is
used as a predictive model to draw conclusions about a set of observations. Tree models
where the target variable can take a discrete set of values are called classification trees; in
these tree structures, leaves represent class labels and branches represent conjunctions of
features that lead to those class labels. Decision trees where the target variable can take
continuous values (typically real numbers) are called regression trees. Decision trees are
among the most popular machine learning algorithms given their intelligibility and
simplicity. In decision analysis, a decision tree can be used to visually and explicitly
represent decisions and decision making. In data mining, a decision tree describes data
(but the resulting classification tree can be an input for decision making).
A tree has many analogies in real life, and turns out that it has influenced a wide area of
machine learning, covering both classification and regression. In decision analysis, a
decision tree can be used to visually and explicitly represent decisions and decision making.
As the name goes, it uses a tree-like model of decisions.
A decision tree is drawn upside down with its root at the top. In the image on the left, the
bold text in black represents a condition/internal node, based on which the tree splits into
branches/ edges. The end of the branch that doesn’t split anymore is the decision/leaf, in this
case, whether the passenger died or survived, represented as red and green text respectively.
Although, a real dataset will have a lot more features and this will just be a branch in a much
bigger tree, but you can’t ignore the simplicity of this algorithm. The feature importance is
clear and relations can be viewed easily. This methodology is more commonly known as
learning decision tree from data and above tree is called Classification tree as the target is to
classify passenger as survived or died. Regression trees are represented in the same manner,
just they predict continuous values like price of a house. In general, Decision Tree
algorithms are referred to as CART or Classification and Regression Trees.
So, what is actually going on in the background? Growing a tree involves deciding on which
features to choose and what conditions to use for splitting, along with knowing when to stop.
As a tree generally grows arbitrarily, you will need to trim it down for it to look beautiful.
Lets start with a common technique used for splitting.
3. The four probabilities (states of nature) assigned from Niharika’s research are as
follows: P(s1) = 0.4 P(s2) = 0.1 P(s3) = 0.3 P(s4) = 0.2.
Expected monetary value for the decision alternative for Reliance Industry ltd.:
The expected monetary value for investing in Reliance Industry Ltd. is Rs.37.3
A decision maker using expected monetary value will choose the maximum out of
expected monetary value for the decision:
Conclusion:
Hence, we can conclude that the expected monetary value i.e. 38, and the decision would be
Reliance Industries Ltd.
Q2. From the following data, check the correlation of ‘Migrants person’ (Migration
form Urban areas of J & K to another urban areas of J &K) with the below given
variables. Write your conclusion with respect to the correlation coefficient and Scatter
Diagram
Draw the scatter plot (you may use EXCEL, SPSS, Python, R etc.)
(Note regarding the data access: You can copy the data from this pdf document and
paste it into your EXCEL workbook; you may have to work on alignment if it is
distorted in Excel.)
Ans 2.
Introduction:
Correlation coefficients are being used to assess the strength of an association between two
variables. The correlation coefficient is a statistical term that aids in the establishment of a
relationship among predicted and actual values acquired in a statistical sum. The estimated
correlation coefficient shows the closeness of the expected and actual values.
A correlation value of 1 indicates that for every positive rise in one variable, there is a fixed
corresponding increment in the other. A correlation coefficient of -1 shows that for every
positive rise in one variable, there is a fixed proportional reduction in the other. For example,
the amount of gas in a tank reduces in (nearly) perfect proportion to speed. Zero indicates that
there is no positive or negative rise for every increase. They are not connected.
Source: https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcQPHJqexQb2-
D9mmV6B16DEgz9jtbWrIpWBbQ&usqp=CAU
A scatter diagram depicts the association amongst two variables. A scatter diagram is among
the most effective tools for displaying a non-linear pattern. A scatter diagram gives evidence
to support a claim that two variables are connected. It is basically a two-dimensional
rectangular coordinate graph with points representing the values of the two variables being
studied.
Source: https://encrypted-tbn0.gstatic.com/images?
q=tbn:ANd9GcRS3A61Ddjl2ruPvTWBIQtg6G52ytCl9pWxxg&usqp=CAU
Correlation coefficients quantify the degree of similarity amongst two variables. A variable
correlation shows that when one variable's value changes, the other variable appears to shift
in a number of directions. Understanding that relationship is beneficial because we may use
one variable's value to anticipate the value of the other variable. Correlation coefficients are a
quantitative measure of the direction and strength of this propensity to change
simultaneously. Correlation is a statistical concept that describes how two variables move in
tandem with each other. It is a numerical statistical metric that reflects the magnitude and
direction of the force amongst two or more variables. When two or more variables move in
the same direction, they are said to be positively correlated. They have a negative correlation
if they travel in opposite directions. The goal of using correlations in research is to determine
which variables are linked.
Pearson's correlation coefficient is calculated by taking the covariance of two variables and
dividing it by the sum of their standard deviations. It is commonly symbolised by (rho). cov
(X,Y) / X = (X,Y).
Scatter Diagrams are useful mathematical tools for investigating the relationship amongst the
two random variables. They take the form of a sheet of paper with data points pertaining to
the variables of interest spread over it. Scatterplots are an excellent technique to immediately
test for correlation between two sets of continuous variables.
Source: https://editor.analyticsvidhya.com/uploads/39170Formula.JPG
1. Using Excel, we can determine the correlation coefficient and obtain the following
data:
Number of Number of
Migrant factory/Work commercial
person shop/Worksh establishment Number of
Districts numbers ed etc. s towns
Migrant
person
numbers V/s
‘Number of
Factory/Work
shop/Work
shed etc.’ 0.8973328316
Migrant
person
number V/s
‘Number of
commercial
establishments
’ 0.9665746707
Migrant
person
number V/s
‘Number of
towns’ 0.6629394793
Migrant
person
number V/s
‘Population -
per sq. km.’ 0.0179593907
Interpretation:
Migrant person number V/s ‘Population per sq. km.’=-0.0179593907, It denotes a negative
correlation coefficient which indicates an inverse relationship between both the two variables.
2. Using excel to plot the scattered diagram of all the 4 correlation coefficients, we get
the following graph:
a. Scattered diagram for Migrant person numbers V/s ‘Number of
Factory/Workshop/Work shed etc.
Conclusion:
Hence, we can conclude that a scatterplot displays the strength, direction, and form of the
relationship between two quantitative variables. A correlation coefficient measures the
strength of that relationship. The correlation coefficient and scattered diagram of all the four
variables are given above with their interpretation.
Q3a. The number of customers who enter a ‘German’ supermarket-Gandhinagar
each hour is ‘normally’ distributed with a mean of 600 and a standard deviation of
200. The supermarket is open 16 hours per day. What is the probability that the total
number of customers who enter the supermarket in one day is greater than 10,000?
(Note: Show the stepwise calculation and write the interpretation based on the final
answer) (5 Marks)
Ans. 3(a)
Introduction:
Probability is synonymous with possibility. It also refers to a mathematical branch that deals
with the occurrence of a random event. The value ranges from zero to one. It basically
indicates the likelihood of anything happening. It expresses the possibility of an event
occurring. The probability of all events in a sample space equals to the value one. A
probability technique can be used to calculate the likelihood of an occurrence by simply
dividing the favourable number of possibilities by the entire number of possible outcomes.
Since there can never be more good outcomes than there are possible outcomes, the chance of
an event occurring can range from 0 to 1. Furthermore, the positive number of outcomes
cannot be negative.
Source:
https://t4.ftcdn.net/jpg/00/73/62/39/360_F_73623950_JCrrWLXvdZJ0Gx56XAL3luNcaza25
FOF.jpg
The number of favorable outcomes to all possible outcomes of an event is the proportion,
which is known as probability. It measures how probable something is to occur. Probability is
a field of mathematics that focuses with estimating the likelihood of occurrence of a
particular event. The Z-score measures how far a given value deviates from the standard
deviation. The Z-score, also known as the standard score, is the number of standard
deviations above or below the mean for a given data set. The standard deviation reflects the
degree of variance within a particular data collection.
Given:
mean=600
Standard deviation = 200
Supermarket is open 16hours per day
Formula:
X̄ = ( Σ xi ) / n
X̄ = 10000/16 = 625
Formula:
z = (x̄ – μ) / σ
where x represents the raw score, μ is the mean of population, and σ represents the population
standard deviation.
Conclusion:
We can conclude that the probability that the total number of customers who enter the
supermarket ‘German’ supermarket-Gandhinagar in one day is greater than 10,000 is 0.4503.
Q 3b. Shree Ganga Taploo University bookstore claims that 50% of its customers are
satisfied with the service and prices.
If this claim is true, what is the probability that in a random sample of 600 customers
less than 45% are satisfied with services and price?
(Note: Show the stepwise calculation and write the interpretation based on the final
answer) (5 Marks)
Ans 3b.
Introduction:
The term "probability" refers to the likelihood of a specific event (or group of events)
occurring, represented on a linear scale from 0 (impossibility) to 1 (certainty), as well as as a
percentage between 0 and 100 percent. Statistics is the study of occurrences guided by
probability. The more common understanding of probability is that it is merely a measure of
the occurrence of outcomes, but Bayesians considers probability more highly subjective as a
statistical technique that attempts to estimate parameters of a statistical properties derived
from empirical distributions. When measured in standard deviation units, a z-score represents
the location of a raw score in respect of its proximity from the mean. The z-score can be
concluded as positive if the value is more than the mean and negative if the value is less than
the mean. It is also known as a standard score since normalising the distribution enables for
comparison of results on other types of factors.
Source: https://i2.wp.com/www.learncbse.in/wp-content/uploads/2019/04/Probability-
Formulas.png?fit=560%2C315&ssl=1
The likelihood of an occurrence is the proportion or ratio of the number of cases favourable
to it to the total number of cases feasible while nothing tells us to believe any of these
circumstances to happen more than any other, making them simultaneously feasible for us.
The Z test is a statistical test performed on data that roughly matches a normal distribution.
For hypothesis testing, the z test can be applied to one sample, two samples, or proportions.
Whenever the population variance is known, it determines whether or not the means of two
big samples vary. A z testing is used to assess if the means of two populations vary or not,
assuming the data follows a normal distribution.
Based on the data given above, we have:
= P(X<270)
= P(p<270/600)
=P(p<0.45)
where x represents the raw score, μ is the mean of population , and σ represents the
population standard deviation.
P(Z< 0.45−0.5
—------------
√ 0.5 ×0.5 /600
P(Z< -2.45)
Getting the data from the standard normal tables, we get is 0.0072
Conclusion:
From this, we can conclude that the probability that in a random sample of 600 customers is
less than 45% are satisfied with services and prices is 0.0072. Also, as 0.0072<0.5 , hence we
can say that the probability of that event happening is unusual and rare.