Professional Documents
Culture Documents
Analytics-
Module 2
0
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Table of Contents
Problem : Company Analysis
1. Random Forest on Train Data………………………………………………………………………………………………
2. Random Forest on Test Data………………………………………………………………… ……………………………6
3. LDA Model on Train Data ………………………………………………………………………………………..…………7
4. LDA Model on Test Data ……………………………………………………………………………………………………8
5. Comparison between Logistic Regression , LDA and Random Forest ……………………………….10
6. Recommendation on Credit Data …………………………………………………………………………………...11
Businesses or companies can fall prey to default if they are not able to keep up their debt
obligations. Defaults will lead to a lower credit rating for the company which in turn reduces its
chances of getting credit in the future and may have to pay higher interests on existing debts as well
as any new obligations. From an investor's point of view, he would want to invest in a company if it
is capable of handling its financial obligations, can grow quickly, and is able to manage the growth
scale.
A balance sheet is a financial statement of a company that provides a snapshot of what a company
owns, owes, and the amount invested by the shareholders. Thus, it is an important tool that helps
evaluate the performance of a business.
Data that is available includes information from the financial statement of the companies for the
previous year (2015). Also, information about the Networth of the company in the following year
(2016) is provided which can be used to drive the labeled field.
Question 1.8- Build a Random Forest Model on Train Dataset. Also showcase your model building
approach.
Ans- Random Forest is a Supervised Machine Learning Algorithm that is used widely in
Classification and Regression problems. It builds decision trees on different samples and takes their
majority vote for classification and average in case of regression.
The model is based on a train and test data split of 67:33 with a random forest .
We have used the above parameters as grid parameters
Using the Random Forest model Train dataset, we obtain 98% accuracy and the following
results:
Through grid search, we built a Random Forest model and got the following parameters
RandomForestClassifier(max_depth=6, max_features=6,
min_samples_leaf=14,min_samples_split=40, n_estimators=201, random_state=1)
Classification report :
Question 1.9 Validate the Random Forest Model on test Dataset and state the performance
matrices. Also state interpretation from the model
Fig 3
Classification Matrix :
Area Under Curve : Area under Curve is 0.9881682797557322
Fig 4
Observation: For the test dataset, the Random Forest Model provided an accuracy of 97% and an
AUC of 98%.
Question : 1.10 Build a LDA Model on Train Dataset. Also showcase your model building approach
Fig 5
Ans:
Area under curve for Test data :
Fig 7
Observation :Accuracy shows us that test data is performing better than train data .
Build LDA Model
Number of rows and columns of the training set for the independent variables: (2402, 63)
Number of rows and columns of the training set for the dependent variable: (2402,)
Number of rows and columns of the test set for the independent variables: (1184, 63)
Number of rows and columns of the test set for the dependent variable: (1184,)
Fig 8
Classification Report:
Ans :
Fig
Observation : From above comparison we see that Random Forest performs better on these data
set.
According to the Model results, the Random Forest model gives the best results; therefore,
RF should be used for the prediction
Book value adj. unit curr. And Net worth are the two most important factors for predicting
Net worth for the upcoming year.
Compared to the Logistic regression model, the LDA model provides better results
The correlation between Net worth and Net worth next year is high, while the Value of
output and Cost of production are highly correlated with Gross sales
In addition, we observe high correlations between net sales and PBDT, PBDT, PBIT, PBT, PAT,
and Adjusted PAT.
We should consider predicting 28 variables out of 66 in the dataset with a Variation Inflation
factor of *5, so as to remove the impact of the Variation Inflation. in order to
avoid multicollinearity on the prediction
Question 2.1 Draw Stock Price Graph(Stock Price vs Time) for any 2 given stocks with inference.
Ans : Axis Bank, SAIL, Shree Cement, Sun Pharma, Jindal Steel, Indian_Hotel, Mahindra & Mahindra,
Indian_Hotel, Mahindra & Mahindra, Idea_Vodafone, and Jet have stock prices available on our
dataset. Airways for the period Mar 2014 to Mar 2020.
Fig 9
Observation:
Infosys shares have been increasing in trend, dropped in 2017 and have now increased
again and have dropped in 2020.
Since 2014, Shree cements has seen an increasing trend and has been stable in 2017 and
has once again seen an increasing trend in 2018.
In 2018, the M&M and Jet airline's share price plummeted sharply and has been
fluctuating since then .
Question 2.3 Calculate Stock Means and Standard Deviation for all stocks with inference
Ans:
Question 2.4 Draw a plot of Stock Means vs Standard Deviation and state your inference.
Ans :
Fig 11
Observation : Stock with a lower mean & higher standard deviation do not play a role in a portfolio
that has competing stock with more returns & less risk. Thus, for the data we have here, we are only
left few stocks. Ones with higher return for a comparative or lower risk are considered better.
Question : 2.5 Conclusion and Recommendations
Ans :
Conclusion:
Stock with a lower mean and higher std deviation do not play a role in a portfolio that has
competing stock with more returns and less risks .Thus for the data we have here, we are
only left few stocks :
1)one with highest return and low risk
2)one with lowest risk and highest return
Stocks like Shree Cement, Infosys, and Axis Bank offer low risk and high returns, and make
good investments.
It is less risky for Sun Pharma, Mahindra and Mahindra, and SAIL to generate lower returns.
Vodafone, Jet Airways, and Jindal Steel are poor investments because they have higher risk
and lower returns.
Recommendations:
We would recommend using the stocks means vs std deviation plot to assess the risk to
reward ratio .More volatile stock might give short term gains but might not be a good
investment in the long term .Whereas a low volatile stock might not be a good investment in
short term but might give a good return in long term.
Stocks like Shree Cement, Infosys, and Axis Bank offer low risk and high returns, and make
good investments .Highly recommendable for invest for long run .
It is less risky for Sun Pharma, Mahindra and Mahindra, and SAIL to generate lower
returns ,people who are fresher in the stock market can think of these stocks as they are less
risky and low return, and it might give them chance to learn more about stock market .
Vodafone, Jet Airways, and Jindal Steel are poor investments because they have higher risk
and lower returns. So, people should be very conscious before investing in these stocks for
long run.
Considering the above insights, investors should choose the stock that matches their
preferences from options