Professional Documents
Culture Documents
Fill page numbers as per your theory assignment in the table below:
QuestionNo Sec.A 1 –10 Sec.B-1 Sec.B-2 Sec.B-3 Sec.B-4 Sec.B-5 Sec.B-6 Sec.B-7
Page 1 of 27
D21MBA11764
2-8 8-10 10-17 17-19 19-22 22-23
Page No.
Page 2 of 27
D21MBA11764
Assignment No. 2
Page 3 of 27
D21MBA14694
Section –A
Answer: - Actuarial science became a formal mathematical discipline in the late 17th century
with the increased demand for long-term insurance coverage. Actuarial science spans several
interrelated subjects, including mathematics, probability theory, statistics, and finance,
economics, and computer science. Historically, actuarial science used deterministic models in the
construction of tables and premiums. In the last 30 years, science has undergone revolutionary
changes due to the proliferation of high-speed computers and the union of stochastic actuarial
models with modern financial theory. Statistics play an important role in actuarial work and in
using modeling tools that are currently available. The CAS must choose the specific topics to be
tested.
Answer: -The only difference between one-way and two-way ANOVA is the number of
independent variables. A one-way ANOVA has one independent variable, while a two-way
ANOVA has two.
Page 4of 27
D21MBA14694
One-way ANOVA: Testing the relationship between shoe brand (Nike, Adidas, Saucing, and
Hookah) and race finish times in a marathon.
Two-way ANOVA: Testing the relationship between shoe brand (Nike, Adidas, Saucing,
Hookah), runner age group (junior, senior, master’s), and race finishing times in a marathon.
All ANOVAs are designed to test for differences among three or more groups. If you are only
testing for a difference between two groups, use a t-test instead.
Answer: -Central tendency is defined as “the statistical measure that identifies a single value as
representative of an entire distribution.” It aims to provide an accurate description of the entire
data. It is the single value that is most typical/representative of the collected data. The term
“number crunching” is used to illustrate this aspect of data description. The mean, median and
mode are the three commonly used measures of central tendency but under different conditions,
some measures of central tendency become more appropriate to use than others. In the following
Page 5of 27
D21MBA14694
sections, we will look at the mean, mode and median, and learn how to calculate them and under
what conditions they are most appropriate to be used.
Answer: -There are two main types of dispersion methods in statistics which are:
An absolute measure of dispersion contains the same unit as the original data set. Absolute
dispersion method expresses the variations in terms of the average of deviations of observations
like standard or means deviations. It includes range, standard deviation, quartile deviation, etc.
The relative measures of dispersion are used to compare the distribution of two or more data sets.
This measure compares values without units. Common relative dispersion methods include:
1. Co-efficient of Range
2. Co-efficient of Variation
3. Co-efficient of Standard Deviation
4. Co-efficient of Quartile Deviation
5. Co-efficient of Mean Deviation
Answer: - Skewness can be measured using several methods; however, Pearson mode skewness
and Pearson median skewness are the two frequently used methods. The Pearson mode skewness
is used when a strong mode is exhibited by the sample data. If the data includes multiple modes
or a weak mode, Pearson’s median skewness is used.
Page 6of 27
D21MBA14694
1. negative skew: The left tail is longer; the mass of the distribution is concentrated on the
right of the figure. The distribution is said to be left-skewed, left-tailed, or skewed to the
left, despite the fact that the curve itself appears to be skewed or leaning to the
right; left instead refers to the left tail being drawn out and, often, the mean being skewed
to the left of a typical center of the data. A left-skewed distribution usually appears as
a right-leaning curve.
2. positive skew: The right tail is longer; the mass of the distribution is concentrated on the
left of the figure. The distribution is said to be right-skewed, right-tailed, or skewed to
the right, despite the fact that the curve itself appears to be skewed or leaning to the
left; right instead refers to the right tail being drawn out and, often, the mean being
skewed to the right of a typical center of the data. A right-skewed distribution usually
appears as a left-leaning curve.
1. When a correlation coefficient is (1), that means for every increase in one variable, there is a
positive increase in the other fixed proportion. For example, shoe sizes change according to the
length of the feet and are perfect (almost) correlations.
2. When a correlation coefficient is (-1), that means for every positive increase in one variable,
there is a negative decrease in the other fixed proportion. For example, the decrease in the
quantity of gas in a gas tank shows a perfect (almost) inverse correlation with speed.
3. When a correlation coefficient is (0) for every increase, that means there is no positive or
negative increase, and the two variables are not related.
Page 7of 27
D21MBA14694
Page 8of 27
D21MBA14694
Page 9of 27
D21MBA14694
SECTION - B
Quantitative data are data about numeric variables (e.g. how many; how much; or how often).
Qualitative data are measures of 'types' and may be represented by a name, symbol, or a number code.
Qualitative data are data about categorical variables (e.g. what type).
Data collected about a numeric variable will always be quantitative and data collected about a
categorical variable will always be qualitative. Therefore, you can identify the type of data, prior
to collection, based on whether the variable is numeric or categorical.
Quantitative and qualitative data provide different outcomes, and are often used together to get a
full picture of a population. For example, if data are collected on annual income (quantitative),
occupation data (qualitative) could also be gathered to get more detail on the average annual
income for each type of occupation.
Quantitative and qualitative data can be gathered from the same data unit depending on
whether the variable of interest is numerical or categorical. For example:
Data unit Numeric variable = Quantitative data
A person "How many children do you have?" 4 children "In which coun
"How much do you earn?" $60,000 p.a. "What is your o
"How many hours do you work?" 38 hours per week "Do you work f
A house "How many square metres is the house?" 200 square metres "In which city o
A business "How many workers are currently employed?" 264 employees "What is the in
A farm "How many milk cows are located on the farm? 36 cows "What is the m
It is important to identify whether the data are quantitative or qualitative as this affects the
statistics that can be produced.
Frequency counts:
Page 10of 27
D21MBA14694
The number of times an observation occurs (frequency) for a data item (variable) can be shown for both
quantitative and qualitative data.
The graphs below arrange the quantitative and qualitative data to show the frequency distribution of the
data.
Quantitative Data
Qualitative Data
Page 11of 27
D21MBA14694
Statistics that describe or summarize can be produced for quantitative data and to a lesser extent
for qualitative data.
As quantitative data are always numeric they can be ordered, added together, and the frequency
of an observation can be counted. Therefore, all descriptive statistics can be calculated using
quantitative data.
As qualitative data represent individual (mutually exclusive) categories, the descriptive statistics
that can be calculated are limited, as many of these techniques require numeric values which can
be logically ordered from lowest to highest and which express a count.
Mode can be calculated, as it the most frequency observed value. Median, measures of shape,
measures of spread such as the range and interquartile range require an ordered data set with a
logical low-end value and high-end value. Variance and standard deviation require the mean to
be calculated, which is not appropriate for categorical variables as they have no numerical value.
Inferential statistics:
By making inferences about quantitative data from a sample, estimates or projections for the total
population can be produced.
Quantitative data can be used to inform broader understandings of a population, or to consider how that
population may change or progress into the future.
For example, a simple income projection for an employee in 2015 may be inferred from the rate of change
for data collected in 2000, 2005, and 2010.
As shown in the graph below, data collected over time indicates a 5% increase every five years.
Therefore, if the rate of increase continues to follow the same pattern, it can be projected that the annual
income for that employee in 2015 will be $46,305; which is the 2010 wage of $44,100 increased by an
additional 5%.
Page 12of 27
D21MBA14694
Qualitative data are not compatible with inferential statistics as all techniques are based on
numeric values.
Some examples will clarify the difference between discrete and continuous variables.
Suppose the fire department mandates that all fire fighters must weigh between 150 and
250 pounds. The weight of a fire fighter would be an example of a continuous variable;
since a fire fighter's weight could take on any value between 150 and 250 pounds.
Suppose we flip a coin and count the number of heads. The number of heads could be any
integer value between 0 and plus infinity. However, it could not be any number between
0 and plus infinity. We could not, for example, get 2.3 heads. Therefore, the number of
heads must be a discrete variable.
Page 13of 27
D21MBA14694
1. Textual presentation
2. Data tables
A table facilitates representation of even large amounts of data in an attractive, easy to read and
organized manner. The data is organized in rows and columns. This is one of the most widely used
forms of presentation of data since data tables are easy to construct and read.
Table Number: Each table should have a specific table number for ease of access and
locating. This number can be readily mentioned anywhere which serves as a reference and
leads us directly to the data mentioned in that particular table.
Title: A table must contain a title that clearly tells the readers about the data it contains, time
period of study, place of study and the nature of classification of data.
Headnotes: A headnote further aids in the purpose of a title and displays more information
about the table. Generally, headnotes present the units of data in brackets at the end of a table
title.
Stubs: These are titles of the rows in a table. Thus a stub display information about the data
contained in a particular row.
Caption: A caption is the title of a column in the data table. In fact, it is a counterpart if a
stub and indicates the information contained in a column.
Body or field: The body of a table is the content of a table in its entirety. Each item in a body
is known as a ‘cell’.
Footnotes: Footnotes are rarely used. In effect, they supplement the title of a table if
required.
There are many ways for construction of a good table. However, some basic ideas are:
Page 14of 27
D21MBA14694
The title should be in accordance with the objective of study: The title of a table should
provide a quick insight into the table.
Comparison: If there might arise a need to compare any two rows or columns then these
might be kept close to each other.
Alternative location of stubs: If the rows in a data table are lengthy, then the stubs can be
placed on the right-hand side of the table.
Headings: Headings should be written in a singular form. For example, ‘good’ must be used
instead of ‘goods’.
Ease of representation: A large amount of data can be easily confined in a data table.
Evidently, it is the simplest form of data presentation.
Ease of analysis: Data tables are frequently used for statistical analysis like calculation of
central tendency, dispersion etc.
Helps in comparison: In a data table, the rows and columns which are required to be
compared can be placed next to each other. To point out, this facilitates comparison as it
becomes easy to compare each value.
Economical: Construction of a data table is fairly easy and presents the data in a manner
which is really easy on the eyes of a reader. Moreover, it saves time as well as space.
Classification of Data and Tabular Presentation
Qualitative Classification
In this classification, data in a table is classified on the basis of qualitative attributes. In other words,
if the data contained attributes that cannot be quantified like rural-urban, boys-girls etc. it can be
identified as a
qualitative
Sex Urban Rural classification of
data.
Quantitative Classification
0-50 29
51-100 64
Spatial Classification
India 139,000
Russia 43,000
Page 16of 27
D21MBA14694
(i) A good table must contain all the essential parts, such as, Table number, Title, Head note,
Caption, Stub, Body, Foot note and source note.
(ii) A good table should be simple to understand. It should also be compact, complete and self-
explanatory.
(iii) A good table should be of proper size. There should be proper space for rows and columns.
One table should not be overloaded with details. Sometimes it is difficult to present entire data in
a single table. In that case, data are to be divided into more number of tables.
(iv) A good table must have an attractive get up. It should be prepared in such a manner that a
scholar can understand the problem without any strain.
(vi) In all tables the captions and stubs should be arranged in some systematic manner. The
manner of presentation may be alphabetically, or chronologically depending upon the
requirement.
(viii) The figures should be rounded off to the nearest hundred, or thousand or lakh. It helps in
avoiding unnecessary details.
(ix) Percentages and ratios should be computed. Percentage of the value for item to the total must
be given in parenthesis just below the value.
(x) In case of non-availability of information, one should write N.A. or indicate it by dash (-).
(xi) Ditto marks should be avoided in a table. Similarly the expression ‘etc’ should not be used
in a table
Page 17of 27
D21MBA14694
Answer: - Dispersion: In Statistics, Dispersion tells you about the how the data is spread when
the set has larger values the points are widely scattered when it has small values the points are
tightly clustered.
Measures of Dispersion: Measures of Dispersion measures describes the spread of the data
around the central value or mean. Measures of dispersion is divided into two parts
2. Variance: Variance is the squared distance from the mean and it treats all the deviations same
regardless of their direction and the drawback of the variance is it will square the points it is hard
to interpret the variance and it is also affected by the outliers in your data.
3. Standard Deviation: Standard Deviation is the squared root of the variance standard deviation
gives you more clarity of deviation of the data from the mean standard deviation is also affected
by the outliers
Page 18of 27
D21MBA14694
4.Quartile Deviation: Quartile Deviation is the half of the difference between upper quartile and
the lower quartiles and Quartile deviation is less affected by the outliers.
1.Coefficient of Range: Coefficient of range is the ratio of difference between highest and lowest
values in the frequency to the sum of the highest and lowest frequency.
data points around the mean. The metric is commonly is used to compare the data dispersion
between two distinct series. Coefficient of variation cannot be computed if the mean of the dataset
is zero, coefficient of variation can be misleading if the dataset has positive and negative values.
use coefficient of variation if the values are in ratio scales don’t if the values are in interval scale.
affected by the outliers in your dataset. if data contains negative values use absolute measures of
deviations.
Meaning of Variability:
Variability means ‘Scatter’ or ‘Spread’. Thus measures of variability refer to the scatter or spread
of scores around their central tendency. The measures of variability indicate how the distribution
Page 19of 27
D21MBA14694
From the following example we can get a clear idea about concept of measures of
variability:
The above figure shows two frequency distribution of the some area (N) and some mean (50) but
of very different variability. Group A ranges from 20 to 80 and Group B from 40 to 60 Group A
is three times as variable as group-B-Spreads over three times the distance on the scale of scores-
Definitions of Variability:
“Dispersion or spread is the degree of the scatter or variation of the variables about a
central value.” Thus the property which denotes the extent to which the values are dispersed
about the central values is called dispersion. It also indicates the lack of uniformity in the size of
items of a distribution.
Measures of Variability:
1. The Range
Page 20of 27
D21MBA14694
These are:
1. The Range:
Range is the difference between in a series. It is the most general measure of spread or scatter. It
is a measure of variability of the varieties or observation among themselves and does not given
an idea about the spread of the observations around some central value.
Range = H—L
L = Lowest score If the range is higher than the group indicates more heterogeneity and if the
range is lower than the group indicates more homogeneity. Thus range provides us an instant and
Next to range quartile deviation is another measure of variability. It is based upon the interval
containing the middle fifty percent of cases in a given distribution. One quarter means 1/4th of
something, when a scale is divided in to four equal parts. “The quartile deviation or Q is the
one-half the scale distance between the 75t and 25th percentiles in a frequency
distribution.”
From the figure 9.2 we found that the 1st quartile or Q1 is position in a distribution below which
25% cases, and above which 75% cases lie. The 2nd quartile or Q2 is a position below and above
The 3rd quartile or Qg is the 75th percentile, below which 75% cases and above which 25%
cases lie. So the quartile deviation (Q) is one half the scale distances between the 3rd quartile
(Q3) and the 1st quartile (Q1). It is also known as the Semi-Interquartile Rage.
Page 21of 27
D21MBA14694
Symbolically:
Therefore, order to compute quartile deviation first of all we have to compute 1st quartile (Q1)
Page 22of 27
D21MBA14694
5. The following data relate to the test scores obtained by eight salesmen in an aptitude test
and their daily sales in thousands of rupees:
62 26 0 -4 0 0 16
56 24 -6 -6 36 36 36
62 30 0 0 0 0 0
64 35 2 5 10 4 25
70 28 8 -2 -16 64 4
54 24 -8 -6 48 64 36
Total --------- -13 -14 90 221 122
B = 24+35/2 =30
Page 23of 27
D21MBA14694
Answer: -
In statistical modeling, regression analysis is a set of statistical processes for estimating
the relationships between a dependent variable (often called the 'outcome' or 'response'
variable) and one or more independent variables (often called 'predictors', 'covariates',
'explanatory variables' or 'features'). The most common form of regression analysis is
linear regression, in which one finds the line (or a more complex linear combination) that
most closely fits the data according to a specific mathematical criterion. For example, the
method of ordinary least squares computes the unique line (or hyperplane) that minimizes
the sum of squared differences between the true data and that line (or hyperplane). For
specific mathematical reasons (see linear regression), this allows the researcher to
estimate the conditional expectation (or population average value) of the dependent
variable when the independent variables take on a given set of values. Less common
forms of regression use slightly different procedures to estimate alternative location
parameters (e.g., quantile regression or Necessary Condition Analysis or estimate the
conditional expectation across a broader collection of non-linear models (e.g.,
nonparametric regression).
Regression analysis is primarily used for two conceptually distinct purposes. First,
regression analysis is widely used for prediction and forecasting, where its use has
substantial overlap with the field of machine learning. Second, in some situations
regression analysis can be used to infer causal relationships between the independent and
dependent variables. Importantly, regressions by themselves only reveal relationships
between a dependent variable and a collection of independent variables in a fixed dataset.
To use regressions for prediction or to infer causal relationships, respectively, a
researcher must carefully justify why existing relationships have predictive power for a
new context or why a relationship between two variables has a causal interpretation. The
latter is especially important when researchers hope to estimate causal relationships using
observational data.
Most of the regression analysis is done to carry out processes in finances. So, here are 5
applications of Regression Analysis in the field of finance and others relating to it.
Forecasting:
The most common use of regression analysis in business is for forecasting future
opportunities and threats. Demand analysis, for example, forecasts the amount of things a
Page 24of 27
D21MBA14694
When it comes to business, though, demand is not the only dependent variable. Regressive
For example, we may predict the highest bid for an advertising by forecasting the number
creditworthiness and the amount of claims that might be filed in a particular time period.
2. CAPM:
The Capital Asset Pricing Model (CAPM), which establishes the link between an asset's
projected return and the related market risk premium, relies on the linear regression model.
The beta coefficient of a stock is calculated using regression analysis. Beta is a measure of
Because it reflects the slope of the CAPM regression, we can rapidly calculate it in Excel
3. Comparing with competition:
Page 25of 27
D21MBA14694
counterpart.
It may also be used to determine the relationship between two firms' stock prices (this can
It can assist the firm in determining which aspects are influencing their sales in contrast to
the comparative firm. These techniques can assist small enterprises in achieving rapid
4. Identifying problems:
Regression is useful not just for providing factual evidence for management choices, but
A retail store manager, for example, may assume that extending shopping hours will
However, RA might suggest that the increase in income isn't enough to cover the increase
in operational cost as a result of longer working hours (such as additional employee labour
charges).
As a result, this research may give quantitative backing for choices and help managers
Page 26of 27
D21MBA14694
5. Reliable source
Many businesses and their top executives are now adopting regression analysis (and
other types of statistical analysisto make better business decisions and reduce guesswork
Regression enables firms to take a scientific approach to management. Both small and
Managers may use regression analysis to filter through data and choose the relevant factors
Financial Industry- Understand the trend in the stock prices, forecast the prices,
evaluate risks in the insurance domain
Marketing- Understand the effectiveness of market campaigns, forecast pricing
and sales of the product.
Manufacturing- Evaluate the relationship of variables that determine to define a
better engine to provide better performance
Medicine- Forecast the different combination of medicines to prepare generic
medicines for diseases.
Page 27of 27