You are on page 1of 23

INFOSYS SHARE

PRICES

GROUP MEMBERS

12301943- SANTANU MAITY

12301949- DHIVYA T

12301950- AKSHAY

12301991- KRITIKA MISHRA

12301995- SOURAV KUMAR RAM

12302002- PRANVI DUTTA


PEER RATING

NAME REG.NO ROLL NO. RATING


SANTANU MAITY 12301943 52
DHIVYA T 12301949 53 10
AKSHAY 12301950 54 10
KRITIKA MISHRA 12301991 55 10
SOURAV KUMAR RAM 12301995 56
PRANVI DUTTA 12302002 57 10

Learning Outcomes:
We during completing this assignment came across various methods of analysing data and various
techniques for evaluating the data collected that can be used further in writing conclusion and
recommendations of same assignment.

Declaration:
I declare that this Assignment is my individual work. I have not copied it from any other student's work
or from any other source except where due acknowledgement is made explicitly in the text, nor has any
part been written for me by any other person.

GENRAL OBSERVATION SUGGESTIONFOR BEST PART OF ASSSINGMENT


IMPROVEMENT

Evaluator's Signature and Date:


ABSTRACT:
This paper will analyze Infosys's share price using a variety of statistical tools, including
correlation, regression, index numbers, and forecasting methods. The goal of this analysis is to
provide investors with a better understanding of the factors that drive Infosys's share price and
to develop forecasting models that can be used to predict the share price in the future. The data
extracted
is from March 16, 1999 to May 25, 1999. The report shows that the share price has fluctuated si
gnificantly over time, with the highest being $0.8125 on May 25, 1999, and the lowest being $0.
613281 on April 21, 1999. The average closing price was $0.674272.
INTRODUCTION
Infosys is a multinational information technology company that provides business consulting,
information technology and outsourcing services. It is headquartered in Bangalore, India and
has over 300,000 employees worldwide. Infosys is one of the largest IT companies in India and is
ranked among the top 50 IT companies in the world.
Infosys's stock price is traded on the National Stock Exchange of India (NSE) and the Bombay
Stock Exchange (BSE). The stock has been volatile in recent years, but it has generally trended
upwards.
This is the data of INFOSYS share listed on KAGGLE where we have taken the close price, open
price, High Price, Low Price, Closed price, High Price, Low Price, Average Price Volume.
We have taken all these variables because it shows the components of share. And the
calculation of data has been done using excel and we have taken out Correlation, Regression,
Index Number and Forecasted the Data.

REVIEW OF LITERATURE
Regression techniques are part of the machine learning approach. In 1805, Legendre published
the method of least squares, which was the earliest form of regression. In 1821, Gauss
published a further development of the theory of least squares which include the Gauss-Markov
theorem. In the nineteenth century, Francis Galton used the term "regression" to describe a
biological phenomenon. Galton's work was later developed into the statistical context by Udny
Yule and Karl Pearson [2]. The regression rule was used on the testing data for predictions of
future stock price trends. The test result was then evaluated [1].
In a regression model, a predicted value Y is related to a
function of x and b. Y = f(x, b)
Y is the dependent variable, x is the independent variable
and b is the unknown parameter.
DATA FOR SHARE PRICES :

DIAGRAMATIC REPRESENTATION
INTERPRETATION
REGRESSION
1.Open Price Regression Statistics
Multiple R 0.146091
R Square 0.021343
Adjusted R
Square 0.000954
Standard Error 0.024652
Observations 50

Multiple (R) -The multiple R value 0.146091. This number shows that the dependent variable
and the independent factors taken together have a weak positive linear relationship
R Square - The R square Value is 0.021343. This indicates that the independent variables in your
model can only account for around 2.13% of the variance in the dependent variable. This is a
relatively low number, indicating that the model does not adequately account for the
dependent variable's variability.
Adjusted R Square - The adjusted value is even lower than the R Square, which raises the
possibility that the addition of independent variables may not be strengthening the explanatory
power of your model.

Standard Error – The standard error value is 0.024652. The predictions of the model are said to
be more accurate when the standard error is lower. The scale of the y dependent variable,
however, determines how this result should be interpreted.

Observations - Our dataset contains 50 observations, which is a sufficient sample size for most
statistical analysis. A greater sample size can frequently increase the accuracy of your findings.
Significance
df SS MS F F
Regression 1 0.000636 0.000636 1.046787 0.311377033
Residual 48 0.02917 0.000608
Total 49 0.029806

Regression -The p-value of 0.311377033 is greater than the conventional significance level of
0.05, and the F-statistic of 1.046787469 is relatively low. This suggests that the independent
variable may not adequately explain the variance in the dependent variable.
Residual - The p-value of 0.311377033 is greater than the conventional significance level of 0.05,
and the F-statistic of 1.046787469 is relatively low. This suggests that the independent variable
may not adequately explain the variance in the dependent variable.
Total - The total component in a regression model represents the total variance in the
dependent variable. It has 49 degrees of freedom and a squared sum of 0.029806089. This is
the sum of the residual variability the overall variability in the dependent variable.

Standard Upper Upper


Coefficients Error t Stat P-value Lower 95% 95% Lower 95.0% 95.0%
Intercept -5.58652 6.11407 -0.91372 0.365433 -17.87968078 6.706642 -17.87968078 6.706642
X Variable 1 0.000172 0.000169 1.023126 0.311377 -0.000166465 0.000511 -0.000166465 0.000511

Intercept- The intercept in the model is -5.58652, but it is not statistically significant (p-value =
0.365433). This means that the intercept may not have a significant impact on the dependent
variable when all other variables are considered.
X Variable 1- The effect of X Variable 1 on the dependent variable is 0.000172, but it is not
statistically significant (p-value = 0.311377). This indicates that the relationship between X
Variable 1 and the dependent variable may be spurious, and we cannot conclude that X
Variable 1 has a causal effect on the dependent variable.
2. High price: Regression Statistics
Multiple R 0.130138062
R Square 0.016935915
Adjusted R -
Square 0.003544587
Standard Error 0.030654502
Observations 50
Multiple R- The coefficient of variation (R) is a measure of the strength and direction of the
relationship between t he independent and dependent variables in the regression model. A
multiple R of 0.1301 indicates a weak positive linear relationship. This indicates that the
independent variables explain a small portion of the variance of the covariate.
R square- The coefficient of determination (Rsquared) is a measure of the goodness of fit of the
regression model. It represents the proportion of the variance of the variable explained by the
independent variables. In this case, the R-squared is 0.0169, which is very small, indicating that
the regression model has a limited ability to re port or explain changes in variables.
Adjusted R Square- The adjusted R-squared is a modified version of the R-squared statistic that
penalizes model complexity, making it a better measure of model fit than R-squared, especially
when there are many independent variables in the model. A negative adjusted R-squared (-
0.0035 in this case) indicates that the model may be overfitting the data or that the
independent variables are not significantly related to the dependent variable.
Standard error of the regression- The standard error of the regression is a measure of the
average difference between the predicted value and the true value of the variable. It is
calculated as the root mean square error. In this case, th e value is 0.0307, which represents the
average size of the prediction error. A small SER is often desired because it indicates a better
model. However, if the R- squared value is low, this indicates that the prediction model is not
very accurate.
Observations- A dataset with 50 observations is generally considered sufficient for most
statistical analysis. However, increasing the sample size can often improve the accuracy of the
findings.
ANOVA
df SS MS F
Significance
F
Regression 1 0.000777064 0.000777064 0.826929 0.36770576
Residual 48 0.045105526 0.000939698
Total 49 0.04588259

Regression - The regression model includes a single independent variable, which is not
statistically significant in explaining the variance in the dependent variable. The low F-statistic
(0.82692872) and high p-value (0.36770576) suggest that there is insufficient evidence to
conclude that the independent variable has a significant effect on the dependent variable.
Residual-The residual is the amount of unexplained change in the dependent variable after the
regression mo del is taken into account. Mean square error (MSE) 0.000939698 represents the
mean square error b etween the actual and predicted values of the variance.
Total- The total variance is the overall variation in the dependent variable, encompassing both
the variance explained by the regression model and the unexplained variance (residual). The
sum of squares for the total variance is 0.04588259.

Intercept - The intercept in the linear regression model is -6.2289678, but it is not statistically

Standard Lower Upper


Coefficients Error t Stat P-value Lower 95% Upper 95% 95.0% 95.0%
Intercept -6.2289678 7.602867781 -0.81929188 0.416669 -21.515558 9.05762242 -21.5156 9.057622
X Variable 1 0.000190618 0.000209619 0.909356212 0.367706 -0.0002308 0.00061209 -0.00023 0.000612
significant (p-value = 0.416669386). This indicates that the intercept may not have a meaningful
effect on the dependent variable when all other independent variables are considered.
X Variable 1 - The intercept in the linear regression model is -6.2289678, but it is not
statistically significant (p-value = 0.416669386). This indicates that the intercept may not have a
meaningful effect on the dependent variable when all other independent variables are
considered.
3. Low price:
Regression Statistics
Multiple R 0.26391
R Square 0.069649
Adjusted R
Square 0.050266
Standard Error 0.024066
Observations 50
Multiple R - The coefficient of variation (R) is a measure of the relationship between the
independent and dependent variables in the regression model. In this case, the R number is
0.2639, indicating a moderate relationship between freedom and progress. This indicates that
the independent variables collectively explain some of the variance in the dependent variable.
R square - The coefficient of determination (R-squared) is a test that shows the proportion of
the variance of the variable explained by the independent variables in the regression model. In
this case, the R-squared is 0.0696, meaning approximately 6.96% of the variance in the variable
is explained by the independent variables. This shows that the regression model cannot predict
or explain the changes in the variables.
Adjusted R square- The adjusted R-squared is a modified version of R-squared that penalizes
the addition of irrelevant variables to a regression model. It is a better measure of the model's
fit than R-squared, especially when there are many independent variables in the model.In this
case, the adjusted R-squared is 0.0503, which is lower than R-squared. This suggests that some
of the independent variables in the model may not be contributing meaningfully to the
explanation of the dependent variable. It is important to note that the adjusted R-squared can
be negative, which indicates that the model is not a good fit for the data.
Standard error of the regression-
The standard error of the regression (SER) is a measure of the average difference between the p
redicted value and the true value of the variable. It is calculated as the root mean square error (
MSE). In this case, the SER is 0.0241, which represents the size of the prediction error.A smaller
SER is generally desirable as it indicates better model fit. However, the fact that the SER is not
zero suggests that there is still some variability in the dependent variable that the model does
not capture.
Observations: Our data set contains 50 observations, which is a sufficient sample for most analy
ses. Larger samples generally increase the accuracy of study results.Observations: Our data set c
ontains 50 observations, which is a sufficient sample for most analyses. Larger samples generall
y increase the accuracy of study results.
ANOVA
Significance
df SS MS F F
Regression 1 0.002081 0.002081 3.593413 0.064038612
Residual 48 0.0278 0.000579
Total 49 0.029882

Regression: The regression model includes a single independent variable, which is not
statistically significant in explaining the variation in the dependent variable. The F-statistic of
3.593412673 is relatively moderate, and the associated p-value of 0.064038612 is slightly higher
than the conventional significance level of 0.05. This suggests that there is insufficient evidence
to conclude that the independent variable has a statistically significant effect on the dependent
variable. However, the p-value is close enough to the significance level to warrant further
investigation. It is possible that the relationship between the independent and dependent
variables is too weak to be detected by the statistical test, or that the data is too noisy to reveal
the relationship.
Residual: The residual component represents the unexplained variance in the dependent
variable after considering the regression model. The mean square error (MSE) is 0.000579177,
which indicates the average amount by which the actual values differ from the predicted values
by the regression model.
Total: The total variance is the overall variation in the dependent variable, encompassing both
the variance explained by the regression model and the unexplained variance (residual). The
sum of squares for the total variance is 0.029881698.

Standard Upper Upper


Coefficients Error t Stat P-value Lower 95% 95% Lower 95.0% 95.0%
Intercept -10.6576 5.968823 -1.78555 0.080491 -22.65874861 1.343496 -22.65874861 1.343496
Date 0.000312 0.000165 1.89563 0.064039 -1.8926E-05 0.000643 -1.8926E-05 0.000643

Intercept: The intercept in the linear regression model is -10.65762609, but it is not statistically
significant (p-value = 0.080491012). This means that there is not enough evidence to conclude
that the intercept is different from zero. The 95% confidence interval for the intercept includes
both positive and negative values, further supporting the conclusion that the intercept is not
statistically different from zero.
X variable: The results suggest that there may be a weak positive relationship between the
"Date" variable and the dependent variable, but the relationship is not statistically significant.
This means that we cannot conclude that the "Date" variable has a causal impact on the
dependent variable, and the observed relationship may be due to chance or other factors.
4. Close price:
Regression Statistics
Multiple R 0.274018921
R Square 0.075086369
Adjusted R 0.055407355
Square
Standard Error 0.029458512
Observations 49

Multiple R: The multiple correlation coefficient (R) is a measure of the strength and direction of
the relationship between variables and the change process. It is calculated as the square root of
the coefficient of determination (R-squared), which measures how much of the variation in the
dependent variable is explained by the independent variables. In your example R is
approximately 0.274; This means that the combination of independent variables explains 27.4%
of the variance of the variable. This is a weak correlation coefficient, indicating that individual
variables are not good predictors of the dependent variable.
R Square -The coefficient of determination (R-squared) is a measure of how well the standard
deviation is measured. It is calculated as the ratio of the change in the dependent variable
explained by the independent variables in the model. An R-squared of approximately 0.075
means that the independent variables in the model explain approximately 7.5% of the variance
in the variable. This is a low R-squared value and indicates that the model is not very good at
predicting the variance of the variables. However, it is worth noting that the R-squared value
can be affected by many factors such as sample size, number of independent variables in the
sample, and type of data used. Therefore, it is important to consider other factors such as the
importance of individual variables and the predictive power of the model when interpreting the
results.
Adjusted R Square - The adjusted coefficient of determination (adjusted R-squared) is a
measure of how well a statistical model predicts the outcome when adjusted for the number of
independent variables in the model. It is calculated by penalizing R-squared values with
unreasonable variance. The adjusted R-squared is approximately 0.055; This indicates that the
model explains approximately 5.5% of the variance even after accounting for the number of
independent variables. This is a low adjusted R-squared value and indicates that the model is
not very good at predicting the variance of the variables. However, it is important to remember
that the adjusted R-squared value can be affected by many factors such as sample size, number
of variables in the model, and type of data used. Therefore, it is important to consider other
factors such as the importance of individual variables and the predictive power of the model
when interpreting the results.
Standard Error: The standard error of a regression is a statistical measure of the average of the
observed and predicted values of the regression model. It is calculated as the mean square error,
which is a measure of the model's total error. The lower the standard error, the better the fit
between the model and the data. This means that the model can better predict the observed
outcomes of different variables. In your example the standard error is about 0.0295, which is
small. This shows that the model fits the data well and can confirm the observed results of the
variables.
Observations: Our data set contains 50 observations, which is a sufficient sample for most analy
ses. Larger samples generally increase the accuracy of study results.Observations: Our data set c
ontains 50 observations, which is a sufficient sample for most analyses. Larger samples generall
y increase the accuracy of study results.

ANOVA
df SS MS F
Significance
F
Regression 1 0.003311154 0.003311154 3.815556 0.05674623
Residual 47 0.040786785 0.000867804
Total 48 0.044097939

Regression: In the regression model, there is an independent variable that is not significant in
explaining the variance of the dependent variable. The F statistic of 3.815555549 is significant
and the corresponding p value is 0.05674623, which is below the 0.05 significance level.
Although this indicates that there may be some evidence of a possible effect of independence
on the variables, it is worth noting that the p value is not statistically significant.
Residual: The residual represents the amount of unexplained change in the dependent variable
after accounting for the regression model. The mean square error (MSE) is 0.000867804, which
represents the average difference between the actual values and the values predicted by the
regression model.
Total: Total variation represents all the variation of the dependent variable, including the
variation explained by the regression model and the unexplained variation (residuals). The sum
of squares of the total variation is 0.044097939.

Standard Lower Upper


Coefficients Error t Stat P-value Lower 95% Upper 95% 95.0% 95.0%
Intercept -14.03681862 7.529219641 -1.86431254 0.06853 -29.183655 1.11001757 -29.1837 1.110018
Date 0.000405483 0.000207584 1.953344708 0.056746 -1.212E-05 0.00082309 -1.2E-05 0.000823

Intercept: The effect in the horizontal model is -14.03681862 but not significant (p value =
0.068529977). This means that there is not enough evidence to conclude that the effect is not
zero. The 95% confidence interval for effect includes both positive and negative effects; This
supports the conclusion that the effect is not significantly different from zero.
X change: The results show that there may be a weak relationship between the "day" variable
and the variable, but this relationship is not significant. This means that "day" changes affect
different variables and we cannot conclude that the relationship may be due to chance or
something else.
5. Adjusted Close Price
Regression Statistics
Multiple R 0.274019
R Square 0.075087
Adjusted R
Square 0.055408
Standard Error 0.019981
Observations 49
Multiple R: The coefficient of variation (R) is approximately 0.274, indicating a weak positive
linear relationship between the dependent variable and all independent variables included. This
means that in total the independent variables explain approximately 27.4% of the variance in
the variable. It is important to remember that relationships are not equal. Just because two
variables are related does not mean that one causes the other. There may be a third variable
that causes the two observed variables to change. Additionally, the strength of the correlation
coefficient can be affected by many factors such as sample size, number of independent
variables, and type of data used. Therefore, it is important to consider other factors such as the
importance of individual variables and the predictive power of the model when interpreting the
results.
R Square: The R-squared value is approximately 0.075, indicating that the independent variables
in the model explain approximately 7.5% of the variance in the variables. This means that the
model cannot explain changes in the quality difference. However, it is worth noting that the R-
squared value can be affected by many factors such as sample size, number of independent
variables in the sample, and type of data used. Therefore, it is important to consider other
factors such as the importance of individual variables and the predictive power of the model
when interpreting the results.
Adjusted R Square: The adjusted R-squared value is approximately 0.055; This indicates that the
independent variables in the model explain approximately 5.5% of the variance, even after
adjusting for independence of the model. This means that even considering the complexity of
the model, the model still cannot explain changes in quality variables. However, it is important
to note that the adjusted R-squared value is a more accurate measure of model fit than the R-
squared value because it penalizes the model for the inclusion of unnecessary differences.
Standard Error: The standard error (SE) of the regression is approximately 0.0199 and
represents the average error, or deviation, between the true value of the variance and the
estimated value of the regression model. A lower error indicates a better fit of the model to the
data. In other words, standard error measures the model's ability to predict the true value of
the variable. A lower error rate indicates that the model is better at predicting the true value of
the variable. The error method can also be used to build confidence in cost estimates. A plant is
the range of values that will have the true value of the variable giving the value of the individual
variable. The longer the confidence interval, the smaller the estimate. Therefore, a lower error
rate is also associated with a shorter confidence interval, indicating a more accurate estimate.
Observations: Our data set contains 50 observations, which is a sufficient sample for most analy
ses. Larger samples generally increase the accuracy of study results.Observations: Our data set c
ontains 50 observations, which is a sufficient sample for most analyses. Larger samples generall
y increase the accuracy of study results.
ANOVA
Significance
df SS MS F F
Regression 1 0.001523 0.001523 3.815572 0.056745705
Residual 47 0.018764 0.000399
Total 48 0.020287

Regression: A regression model has an independent variable that is not significant in explaining
the variance of the variable. The F statistic of 3.815572375 is significant and the corresponding
p value is 0.056745705, which is below the 0.05 significance level. This suggests that there is
some evidence that the independent variable may influence the difference, but the relationship
is not significant.
Residuals: Residuals represent the unexplained change in the dependent variable after
considering the regression model. The mean square error is 0.000399226, which represents the
average difference between the actual values and the values predicted by the regression model.
Total: Total variation represents all the variability of the dependent variable, including the
variation explained by the regression model and the unexplained difference . The sum of
squares of the total variation is 0.020286909.

Standard Upper Upper


Coefficients Error t Stat P-value Lower 95% 95% Lower 95.0% 95.0%
Intercept -9.52069 5.106798 -1.86432 0.068529 -19.79424343 0.752862 -19.79424343 0.752862
Date 0.000275 0.000141 1.953349 0.056746 -8.22134E-06 0.000558 -8.22134E-06 0.000558

Effect: The effect in the horizontal model is -9.520690719 but not significant (p value =
0.068529335). This means that there is not enough evidence to conclude that the effect is not
zero. The 95% confidence interval for effect includes both positive and negative effects; This
supports the conclusion that the effect is not significantly different from zero.

X variable: 0.000275025 in variable, all else equal. However, the p value for the Day variable is
0.056745705, slightly above the 0.05 significance level. This shows that the date variable is not
significant at the 0.05 level, meaning that its effect on the variable may not be significant in the
context of the model. The fact that the 95% confidence interval for the change date also
includes positive and negative values further supports this interpretation.
6. Volume:
Regression Statistics
Multiple R 0.292316011
R Square 0.08544865
Adjusted R
Square 0.066395497
Standard Error 3279933.378
Observations 50

Multiple R: The multiple correlation coefficient (R) is approximately 0.292, indicating a weak
relationship between the variable and each independent variable taken together. This means
that the independent variables as a group explain 29.2% of the variance. It is worth noting that
correlation does not equal causation. Just because two variables are related does not mean that
one causes the other. There may be a third variable that causes the two observed variables to
change. In addition, the strength of the correlation coefficient can be affected by many factors
such as sample size, number of individual variables, and type of data used. Therefore, it is
important to consider other factors such as the importance of individual variables and the
predictive power of the model when interpreting the results.

R Square: The R-squared value is approximately 0.085, indicating that the independent variables
in the model explain approximately 8.5% of the variance in the variables. This is a low R-squared
value and indicates that the model does not explain the difference in quality. It is important to
note that the R-squared value can be affected by a number of factors such as sample size, the
number of fixed independent variables in the model, and the type of data used. Therefore, it is
important to consider other factors such as the importance of individual variables and the
predictive power of the model when interpreting the results.
Adjusted R Square: The adjusted R-squared value is approximately 0.066; This indicates that the
independent variables in the model explain approximately 6.6% of the variance, even after
adjusting for freedom of the model. This means that even when the complexity of the model is
taken into account, the model still cannot explain changes in quality difference. However, it is
important to note that the adjusted R-squared value is a more accurate measure of model fit
than the R-squared value because it penalizes the model for containing parameters of different
values.

Standard Error: The adjusted R-squared value is approximately 0.066; This indicates that the
independent variables in the model explain approximately 6.6% of the variance, even after
adjusting for freedom of the model. This shows that even when the complexity of the model is
taken into account, the model's ability to explain the change in variables is limited.
Observations: Our data set contains 50 observations, which is a sufficient sample for most analy
ses. Larger samples generally increase the accuracy of study results.Observations: Our data set c
ontains 50 observations, which is a sufficient sample for most analyses. Larger samples generall
y increase the accuracy of study results.

ANOVA
Significance
df SS MS F F
Regression 1 4.82468E+13 4.82468E+13 4.484751 0.03940443
Residual 48 5.16382E+14 1.0758E+13
Total 49 5.64629E+14

Regression: The regression model includes a single independent variable, which is statistically
significant in explaining the variance in the dependent variable. The F-statistic of 4.484751149 is
significant, with an associated p-value of 0.039404429, which is lower than the conventional
significance level of 0.05. This suggests that there is evidence to conclude that the independent
variable has a statistically significant effect on the dependent variable.

Residual: The residual component represents the unexplained variance in the dependent
variable after considering the regression model. The mean squared error (MSE) is 1.0758E+13,
indicating the average amount by which the actual values differ from the predicted values by
the regression model.
Total: The total variance represents the overall variability in the dependent variable,
encompassing both the variance explained by the regression model and the unexplained
variance (residual). The sum of squares for the total variance is 5.64629009203200.

Intercept: variable is associated with an estimated increase of The intercept in the linear
regression model is 1,727,690,095, which represents the estimated value of the dependent
Standard Lower Upper
Coefficients Error t Stat P-value Lower 95% Upper 95% 95.0% 95.0%
Intercept 1727690095 813482474 2.123819689 0.038865 92073957.6 3363306232 92073958 3.36E+09
Date -47497.4428 22428.54257 -2.1177231 0.039404 -92593.05 -2401.8356 -92593.1 -2401.84
variable when all independent variables are zero. The p-value associated with the intercept is
0.038865233, which is lower than the conventional significance level of 0.05. This indicates that
there is statistically significant evidence that the intercept is different from zero. The 95%
confidence interval for the intercept does not include zero, further supporting the statistical
significance of the intercept.

X variable: The coefficient for the "Date" variable is -47,497.4428, which indicates that a one-
unit increase in the "Date" variable is associated with an estimated decrease of 47,497.4428 in
the dependent variable, all else being equal. The p-value associated with the "Date" variable is
0.039404429, which is lower than the conventional significance level of 0.05. This suggests that
there is statistically significant evidence that the "Date" variable has a negative effect on the
dependent variable. The 95% confidence interval for the "Date" variable does not include zero,
further supporting the statistical significance of the "Date" variable.
CORRELATION
Date Open High Low Close Adj Close Volume

Date 1

Open 0.146091172 1

0.84031288
High 0.130138062 1 1

0.86631071 0.86455634
Low 0.263910348 4 2 1

0.91769433 0.89484416
Close 0.281329757 0.75384406 3 5 1

Adj 0.75384481 0.91769392 0.89484453


Close 0.281329998 9 3 3 1 1

- 0.03477057 0.26938957 0.03985193 0.11589178 0.11589134


Volume 0.292316011 3 1 3 2 2 1
The correlation matrix of Infosys shares from March 16 to May 25 shows that the price has a go
od correlation with values between 0.84 and 0.92 associated with the indicators (open price, hig
h price, low price, close closing price and correction price). This means that these indicators are
closely related and highly correlated. This is probably because both of these indicators are meas
ures of the same thing
Trading volume has a weak correlation with other indicators and a negative correlation with the
opening price; This shows that a higher opening price is associated with a lower trading volume.
However, this correlation is not very strong (r = -0.29).
Overall, the correlation matrix shows that Infosys stocks investors should focus on price-
related metrics when making investment decisions. Volume can be a good indicator but should
be considered along with other factors.
Description of the relationship:
Strong relationship (r > 0.7): two variables move in the same direction.
Weak correlation (0.3 < r < 0.7): Two variables move in the same direction but weakly.
No relationship (0 < r < 0.3): There is no relationship between two variables.
Negative correlation (-0.3 < r < 0): Two variables move in opposite directions, but are weak.
Negative correlation (r < -0.7): Two variables move in opposite directions.
It is worth noting that correlation does not equal causation. Just because two variables are relat
ed does not mean that one causes the other. There may be a third variable causing changes in t
he two observed changes.
INDEX NUMBER:

Total of Infosys Shares

Month Open High Low Close Adj Close Volume

March 8.0791 8.2988 7.9072 8.041 5.4539 89376000

April 13.847 14.116 13.54 13.804 9.36 88140800

May 11.52 11.82 11.4 11.65 7.9 70470400

Aggregate Index Number

Month Open High Low Close Adj Close Volume

March 100 100 100 100 100 100


April 171 170 171 172 172 99

May 143 142 144 145 145 79

The given data shows the Aggregate Index Number for Infosys share from March to May 2023,
with March as the base value of 100. The index number for April is 172, which means that the
stock price has increased by 72% since March. The index number for May is 145, which means
that the share price has decreased by 15% since April, but is still 45% higher than it was in
March.
The Aggregate Index Number for Infosys share shows that the share price has increased
significantly over the past few months, with a slight decline in May. This is likely due to a
number of factors, such as the company's strong financial performance and the overall bullish
sentiment in the share market.
The index number for April 2023 is 172, which indicates that the stock price of Infosys increased
by 72% from March to April. This is a large increase in a relatively short period of time.
The index number for May 2023 is 145, which indicates that the stock price decreased by 15%
from April to May. However, the stock price is still 45% higher than it was in March. This
suggests that the share price is still on an upward trend, despite the slight decline in May.
The Aggregate Index Number for Infosys share suggests that the stock is on an upward trend.
However, it is important to note that the share market is volatile and past performance is not
indicative of future results.

CONCLUSION
Based on the provided regression analysis for Infosys share price data with various independent
variables, here are the conclusions for each of the dependent variables (Open, High, Low, Close,
Adj Close, Volume):
1.Open Price Prediction:
The regression model for predicting the Open price of Infosys share is not statistically significant,
as indicated by the low R-squared value (0.021342631) and the high p-value (0.311377033).
Neither the intercept nor the coefficient for X Variable 1 (Open) is statistically significant based
on their t-stats and p-values.
2.High Price Prediction:
The regression model for predicting the High price of Infosys share also lacks statistical
significance, with a low R-squared value (0.016935915) and a high p-value (0.36770576).
Both the intercept and the coefficient for X Variable 1 (High) are not statistically significant.
3.Low Price Prediction:
The regression model for predicting the Low price shows slightly better performance, with a
moderate R-squared value (0.069648672) and a p-value slightly below 0.05 (0.064038612).
The intercept is not statistically significant, but the coefficient for Date (Low) is statistically
significant.
4.Close Price Prediction:
The regression model for predicting the Close price exhibits a similar level of statistical
significance to the Low price model. It has an R-squared value of 0.075086369 and a p-value
slightly below 0.05 (0.05674623).
The intercept is not statistically significant, but the coefficient for Date (Close) is statistically
significant.
5.Adjusted Close Price Prediction:
The regression model for predicting the Adjusted Close price has a similar level of statistical
significance as the Close price model, with an R-squared value of 0.075086675 and a p-value
slightly below 0.05 (0.056745705).
The intercept is not statistically significant, but the coefficient for Date (Adj Close) is statistically
significant.
6.Volume Prediction:
The regression model for predicting the Volume of Infosys share has a relatively higher R-
squared value (0.08544865) compared to the price models, indicating some level of explanatory
power.
The regression model is statistically significant, as the p-value associated with the F-statistic is
below 0.05 (0.039404429).
Both the intercept and the coefficient for Date (Volume) are statistically significant.
In summary, the regression models for predicting the Open, High, Low, Close, and Adjusted
Close prices of Infosys share do not exhibit strong statistical significance and have low
explanatory power. On the other hand, the regression model for predicting the Volume of
Infosys share shows some level of statistical significance and explanatory power. Further
research and the inclusion of additional variables may be necessary to build more robust
models for predicting stock price trends.
Furthermore, the correlation matrix for the Infosys share demonstrates that there is a
significant link between the index and price. Although not extremely strong, there is a weak
correlation between trade volume and other indicators, particularly the opening price. As a
result, while making investment decisions, investors should concentrate primarily on the price
index while taking into account shares and other considerations.
Although,The Aggregate Index Number for Infosys share from March to May 2023 shows a
considerable and quick rise in share price. In particular, from March to April, the share price
increased by a remarkable 72%, signaling great investor confidence and probably fueled by
favorable circumstances, such as Infosys' outstanding financial performance and the general
bullish sentiment in the share market at the time. Although there was a little downturn in May,
with the index number falling by 15%, it's crucial to remember that the share price remained
45% higher than it had been in March. This implies that despite the May fall, Infosys share has
maintained an overall upward trend over this period.

REFERANCE AND WEBSITE


(1)Freedman, David (2005) Statistical Models: Theory and Practice, Cambridge
University Press.

(2)Kotsiantis, S.; Kanellopoulos, D. ; Pintelas, P. (2006) "Data Preprocessing


for Supervised Leaning", International Journal of Computer Science
Drive link:
https://docs.google.com/spreadsheets/d/1cX2qdp-
aVnifeyFbhRHlYau5GMX68RqF/edit?usp=drive_link&ouid=102658188580921239489&rtpof=tru
e&sd=true

source website:
EPAM, Infosys and Capgemini stock prices (kaggle.com)

Airtcle website:

Paper Title (use style: paper title) (researchgate.net)

You might also like