Professional Documents
Culture Documents
COMPUTATIONAL
TECHNIQUES FOR
DATA ANALYSIS
SEMESTER 6
A
PROJECT REPORT
ON
DATA ANALYSIS
SKILL ENHANCEMENT COURSE PAPER
SUBMITTED BY:
ROHAN THAKUR (19/21256)
SECTION - A
SUBMITTED TO:
MS.KRITIKA
B.A PROGRAMME
SEMESTER 6th
1. INTRODUCTION
In this project we have taken the two variables non oil Exports
and currency exchange rate. Non-oil products refer to other
essential natural products which are used as raw materials and
research items and components for the production of other
commodities and for advancing cognitive knowledge respectively.
They are found in laboratories, with scientists, agriculturists,
researchers and others. Examples are coal, zinc, iron ore, copper,
aluminium, etc. An exchange rate is the value of one nation's
currency versus the currency of another nation or economic zone.
The Factors affecting non-oil Exports are the currency
exchange rates, Political circumstances, productivity to
trade policies, inflation and demand. In this data we have
taken the non oil exports of India and the currency exchange
rate of US dollar from 1970 to 2020 from this data we have tried
to show the relationship between the non-oil exports and the
exchange rates There are many factors that affect the non oil
exports or exports, but we have taken the one factor which is
currency exchange rates. But before that we like to give a gist of
the Indian economy. In 1991 before the LPG was formed in India
(L.P.G means liberalisation, privatisation and globalisation) India
didn't adopt the liberalisation, privatisation and globalisation and
it was a closed economy. On 24 July 1991 India became an open
economy and adopted Liberalisation, privatisation and
globalisation. India under its New Economic Policy approached
International Banks for development of the country. These
agencies asked the Indian Government to open its restrictions on
trade done by the private sector and between India and other
countries. After globalisation India started to import from different
countries But this increased the imports of the Indian economy
but exports were still less as compared to the imports this has
started to devalue the Indian Currency or Indian rupee. In 1980
the $1 was equal to the 7 Indian rupees but after 1991, when the
LPG was formed and adopted in India the value of the Indian
rupee started depreciating (In our data we are comparing Indian
rupee with US dollars) Before 1991 the value of $1 was 7 rupee
in 1970 and 1980 but after 1991 it started increasing to $1 is
equal to 44 rupees in 2000 and $1 is equal to 74 rupees in 2020
this shows the devaluation of Indian rupee in context of US
dollar. In our data of non oil exports and US Dollar exchange rate
we have tried to show the relationship between the US dollar
exchange rate and non oil exports of India from 1970 to 2020.
2. DESCRIPTIVE STATISTICS:- Descriptive statistics are
brief descriptive coefficients that summarise a given data set,
which can be either a representation of the entire population
or a sample of a population. Descriptive statistics are broken
down into measures of central tendency and measures of
variability (spread). Measures of central tendency include the
mean, median, and mode, while measures of variability
include standard deviation, variance, minimum and
maximum variables, kurtosis, and skewness.
Non-oil Exports
Column 1
Mean 440776.4902
Standard Error 89514.76003
Median 104836
Mode --
Standard Deviation 639263.2521
Sample Variance 408657505460
Kurtosis 0.4098832745
Skewness 1.377539523
Range 1980270
Minimum 1527
Maximum 1981797
Sum 22479601
Count 51
Largest(1) 1981797
Smallest(1) 1527
Confidence Level(95%) 179795.6834
Coefficient of Variance 1.450311589
Non-oil Exports
Measures Number
Mean 440776.4902
Median 104836
Mode --
Harmonic mean 11074.87078
Geometric Mean 74826.86709
Relationship AM>GM>HM
2.4 MEDIAN
The median is generally calculated to overcome the
drawbacks of the mean. Median refers to the middlemost
number or the central value of the distribution when sorted in
ascending or descending order.
The median divides the data set into two halves, one which
contains values more than the median and the other
containing values less than the median, 50% of the values lie
above the median and 50% of the values lie below the
median.
In our data the median of the non-oil exports is 104836 in
1995.
2.5 MODE
A mode is a measure of central tendency which helps to
identify the value having the highest frequency. It represents
the most common and frequently occurring number in the
data set. The mode is generally useful and applied in
situations that demand numbers.
In our data the mode of the non- oil exports is nil, which
means there are no frequently occurring or common number
of exports in our data.
2.6 RANGE
The Range is the simplest measure of dispersion. A higher
range implies higher variation or speed in the data. It is the
difference between the highest and the lowest values in the
data set. It is, therefore, calculated by subtracting the
minimum value from the maximum value in the data set.
RANGE=Maximum value of the data - Minimum value of the data
In our data the range of non- oil exports is 1980270. So it
shows the difference between the highest and the lowest value in
the data set of non-oil exports is 1980270.
2.8 VARIANCE
Variance refers to the average of the squared deviations from
the mean. In simple words, it is the square root of the
standard deviation.
The variance in our non-oil exports data is
408657505460.295.
2.11 KURTOSIS
Kurtosis is another measure of descriptive statistics, kurtosis
defines the shape of a distribution tale in relation or
comparison with the normal distribution. It is a measure of
‘tailedness’ of a distribution. The value of kurtosis always
includes extreme values or outliers.
2.12 HISTOGRAM
A histogram is a graphical representation that organises a
group of data points into user-specified ranges. Similar in
appearance to a bar graph, the histogram condenses a data
series into an easily interpreted visual by taking many data
points and grouping them into logical ranges or bins.
Non-oil Exports
Bin Frequency
200000 32
400000 3
600000 3
800000 2
1000000 1
1200000 1
1400000 1
1600000 3
1800000 2
2000000 3
More 0
US Dollar
Column 1
Mean 32.23668627
Standard Error 2.98737919
Median 32.4232
Mode --
Standard Deviation 21.33415468
Sample Variance 455.1461558
Kurtosis -1.242719095
Skewness 0.2880062721
Range 66.5752
Minimum 7.5244
Maximum 74.0996
Sum 1644.071
Count 51
Largest(1) 74.0996
Smallest(1) 7.5244
Confidence Level(95%) 6.000327575
Cofficient of Variance 0.661797385
US Dollar
Measures Number
Mean 32.23668627
Median 32.4232
Mode --
Harmonic mean 17.66617079
Geometric Mean 24.30470469
Relationship AM>GM>HM
As shown in the data, the relationship between arithmetic
mean, geometric mean and harmonic mean of the US Dollar
exchange rate are:
AM>GM>HM
3.4 MEDIAN
Unlike the main model, the median is not affected by the
extreme values in the data set. does it is a better measure of
Central tendency in case of skewed distribution. In our data
the median of the US Dollar exchange rate is 32.4232. In
Excel we use =MEDIAN to find the median in a data.
3.5 MODE
Mode is not affected by the presence of extreme values in
the data set data set, it can be calculated for both
quantitative and categorical data or data measured on
nominal scale. In our data the mode of the US dollar
exchange rate is nil, which means there are no frequently
occurring numbers in our data, which can also be shown by
the Excel method using the formula is =MODE.
3.6 RANGE
Range is not based on all the values in the data set; it does
not facilitate further analysis.
RANGE=Maximum value of the data - Minimum value of the data
In our data the range of the US dollar exchange rate is
66.5752.
3.8 VARIANCE
Variance describes the spread of the variation in the data. It
is a useful static stick used to draw insights from service. The
variance in our US Dollar exchange rate is 455.146. To
find variance in excel we type =VAR.
3.10 SKEWNESS
In our data of the US Dollar exchange rate the skewness is
0.2880062721. It is between 0 to 0.5 which means the
distribution is comparatively symmetrical. It also shows that
the distribution is positively skewed and the extreme values
are on the left side but in less years. To find skewness in
excel sheet =SKEW.
3.11.KURTOSIS
In our data the kurtosis of the US Dollar exchange rate is
-1.242719095, which means it is platykurtic and there is
less concentration of items near the mean. To find kurtosis in
excel we type =KURT.
3.12.HISTOGRAM
A histogram is a graphical representation that organises a
group of data points into user-specified ranges.
Histogram of US Dollar exchange rate:
US Dollar
Bin Frequency
10 13
20 8
30 2
40 5
50 14
60 2
70 5
80 2
More 0
In our data of the US exchange rates it shows the platykurtic
and right skewed, it's also showing left skewness because in
1991 the reforms were made in India before that India didn’t
adopt the globalisation. That’s why there are high frequency
values from 1970 to 1980 which makes it slightly left
skewed. India's economy has grown dramatically since it
integrated into the global economy in 1991. It has a drastic
impact on India's economic condition. Its average annual rate
has grown from 3.5% (1990 –1980) to 7.7% (2002–2012) .
It has a high frequency on the right side in less years.
3.COVARIANCE
Covariance is a statistical measure that analyzes the linear
relationship between two random variables. It evaluates how
the two variables vary together or covary. For example, if one
variable goes up or down or remains constant, then what
happens to the other variable. Accordingly, we can have the
following types of linear relationship or covariances.
1.Positive covariance - It indicates a direct relationship
between the variables, that is the two variables tend to move
together in the same direction either upward or downward.
2.Negative covariance- It indicates an inverse relationship
between variables, the two variables tend to move away or in
the opposite direction from each other.
3.Zero covariance- It indicates if the two variables are
independent or not related to each other, the covariance will
be zero. In short, a zero covariance means that there is no
linear relationship between two variables and does not
exhibit any sort of pattern.
In our data the X variable is US dollar exchange rate and Y
variable is the non-oil export, through the X and Y variable
we find the covariance is 11469084, which means it is
Positive covariance or in direct relationship.
4.CORRELATION
Correlation is a statistical technique that measures both the
direction and the strength of the linear relationship between
the variables. The study of correlation is important as it
quantifies the degree of association between the variables
and shows how strongly the variables are related. If two
variables are correlated it means that if one variable
increases or decreases, then the other variables also change
either in the same or opposite direction besides assessing the
strength of the relationship. However, correlation does not
necessarily mean causation. That is, correlation in no way
confirms that variable X causes variable Y to vary or vice
versa.
The degree of association between data sets determines the
correlation coefficient. A correlation coefficient expresses or
defines the strength of the association between variables,
that is, whether it is strong or weak or non-existent. The
value of the coefficient of correlation lies between +1 and -1.
The positive and negative signs will indicate the direction of
the relationship,that is direct or inverse relation. The closer
the value of the coefficient is to (1+,-) the stronger is the
direct or inverse relationship.
5.REGRESSION ANALYSIS
Regression analysis refers to a statistical technique that
quantifies the relationship between the variable and predicts
the value of one variable from another set of a variable. The
variable which predicts the value of the other variable is
known as the independent variable or explanatory variable
are the predictor or simply the X variable. Likewise the
variable whose value is predicted is known as the dependent
variable or variable of interest or the Y variable.
Linear regression analysis essentially models the relationship
between a dependent and an independent variable by fitting
a linear equation.
Regression analysis can be linear or non-linear based on the
nature of relationship between variables.
Mean of X 32.23668627
Mean of Y 440776.4902
B1 25702.65733
B0 -387792.0105
Y 41869201.53
TYPES OF ERRORS:-
1. Type 1 error:- When we reject the null hypothesis when
it is true.
2. Type 2 error:- When we do not reject the null hypothesis
when it is false.
In this data the X variable is the US dollar exchange rate and
the Y variable is the non oil Exports. Through the help of data
analysis toolpak we have found the regression between X and
Y variable and with the help of regression we will find the
relationship between the X and Y variable.
SUMMAR
Y
OUTPUT
Regressio
n Statistics
0.857775674
Multiple R 2
0.735779107
R Square 3
Adjusted 0.730386844
R Square 2
Standard
Error 331932.8843
Observatio
ns 51
ANOVA
Significan
df SS MS F ce F
Regressio 150340827 150340827 136.450891
n 1 27541 27541 1 0
539879254 1101794397
Residual 49 5473 04
204328752
Total 50 73015
α level = 0.05
P-VALUE < α
So as we can see the p-value is smaller than significance
level,therefore we reject the null hypothesis. That
means there is a significant relationship between the X
variable and the Y variable.
6. R SQUARE
It is similar to the coefficient of correlation and simply
calculated by squaring the correlation coefficient between the
two variables, that is R Square. The value of R square ranges
between 0 to 1. For example, in my data set, R Square is
0.735779107279952 means 73.5779107279952% of the
variation in the dependent variable(U.S dollar exchange rate)
can be explained by the independent variable(Non- oil
Exports).
7. ADJUSTED R SQUARE
It may be noted that the value of adjusted r square is always
less than R Square. The reason being that as the number of
explanatory variables increases in the model, the
denominator becomes smaller and the value in parenthesis
will become larger. Resultantly, R Square the value will come
out to be smaller than the R Square value.
As we can see the R square is greater than adjusted R
square.
R SQUARE=0.735779107279952
ADJUSTED R SQUARE= 0.730386844163216
As we can see the R square is greater than adjusted R
square:
0.735779107279952 > 0.730386844163216
CONCLUSION
From the given data we have use different tools to find the
relationships between the X and Y variable which is US dollar
exchange rate and the Non-oil Exports we have used the
descriptive statistics to find the mean, median, mode
skewness, kurtosis and also tried other methods like
correlation, covariance, Regression analysis, R square and
the Adjusted R square. From hypothesis analysis we have
noted that all the approaches remain the same which is to
reject the null Hypothesis. This concludes that there is a
significant relationship between X variable (US dollar
exchange rate) and Y variable (Non-oil Exports).