Professional Documents
Culture Documents
An Internship Report
On
2021-2022
Department of Information Science and Engineering
Bapuji Institute of Engineering and Technology
Davangere – 577004, Karnataka
Bapuji Institute of Engineering and Technology,
Davangere – 577004, Karnataka.
CERTIFICATE
This is to certify that Mr. PRAJWAL K bearing USN 4BD17IS062 of Information Science and
Engineering department has satisfactorily submitted internship report entitled “FORCASTING
PRICE OF CRUDEOIL USING LONG SHOR TERM MEMORY(LSTM)BASED ON RNN”.
The internship report has been approved as it satisfies the academic requirement with respect to the
Internship work prescribed for Bachelor of Engineering Degree of the Visvesvaraya Technological
University, Belagavi, during the year 2021-22.
Mr. Puneeth S P M Tech Mr. Sheik Imran M.Tech. Dr. Poornima B Ph.D.
Assistant professor Assistant professor Professor & Head
External Viva
1.
2.
Rove Labs Pvt Ltd
+91-9591071117|office@rovelabs.com
www.rovelabs.com
CIN: U74999KA2017PTC100899
GSTIN: 29AAICR1415R1ZT
SHREYAS S VERNEKAR
Program Director
COMPANY PROFILE
Rove Labs is a visionary ed-tech company with a mission to power your career through latest
technology education and upgrade today’s younger generation with 21st century skills. With
our quality curriculum and powerful digital platform now, you can learn from anywhere and
anytime. We have trained more than 8 thousand people. Technology and how we work today
is changing so rapidly that it is difficult to find employees with experience in emerging
technologies and work practices. This skill gap is threatening the sustainability of enterprises
around the world. There is a strong need to revamp the education to employment and
employment to talent creation through carefully designed curriculum, crafted to suite the future
technology. Our industry driven curriculum, robust career guidance services and capable
community of Universities, Institutions, Instructors and Employers enable learners with skills
for careers in the 21st century.
Company mission: To promote academic conversations or dialogue that foster creativity and
integrate critical thinking skills within and across all content areas. Our aim is to provide
students with repeated opportunities to practice higher order thinking and establish safe,
intellectually risk-free learning environments and resources. Rove Labs consistently cultivate
problem solving and logical thinking capabilities in students and develop communication &
collaboration within by sharing ideas and working together.
ACKNOWLEDGEMENT
Salutation to our beloved and highly esteemed institute "Bapuji Institute of Engineering and
Technology." For having well qualified staff and labs furnished with necessary equipment.
I express my sincere thanks to our internal guide Mr. Puneeth S P M Tech. external guide Mr.
Shreyas Vernekar and Internship coordinator Mr. Sheik Imran. for giving constant encouragement,
support and guidance throughout the course of the Internship Seminar, without whose stable guidance
this seminar report would not have been achieved.
I express whole hearted gratitude to Dr. Poornima B. H.O.D. of IS & E. I wish to acknowledge
her help who made my task easy by providing with her valuable help and encouragement.
I would like to thank our beloved Principal Dr. H.B. Aravind and the Director Prof. Y
Vrushabhendrappa of this Institute for giving me the opportunity and guidance to work for the
Internship Seminar.
I would like to extend my gratitude to all teaching and non-teaching staff of Department of
Information Science and Engineering for the help and support rendered to me. I have benefited a lot
from the feedback, suggestions given by them.
I would like to extend my gratitude to all my family members and friends especially for their
advice and moral support.
PRAJWAL K
4BD17IS062
i
Vision and Mission of the Department
Vision
“To be the center of excellence by adopting technological innovation in academics and research to
develop competent man power for emerging needs of society and industries.”
Mission
M1: To provide quality education to meet the challenges of technological changes to succeed in their
professional career and higher education.
M2: To inculcate the culture of research, innovation and entrepreneur skills among the students.
M3: To groom our students with the quality of team spirit, leadership skills and ethical values, to share
and apply their knowledge for the benefit of the society.
PSO2- Knowledge of Information Technology – Analyze, design, develop and test the
computer based software in the areas related to networks, cloud computing,
web technology, data science and IoT.
PSO3- Profession and Research Ability – Inculcate the knowledge to excel in IT profession,
entrepreneurship and research with ethical standards.
.
PEO1- The graduates of program will have excellence through principles and practices of information
technology combined with fundamentals of engineering.
PEO2 - The graduates of program will be prepared in diverse areas of information science for their
successful careers, entrepreneurship and higher studies.
PEO3 - The graduates of program will work effectively as an individual and in a team, exhibiting
leadership qualities, communication skills to meet the goals of the organization.
PEO4 - The graduates of program will grove their profession with ethics, management principles to
carry societal responsibilities.
Program Outcomes (POs) defined by NBA:
COURSE OUTCOMES
iv
ABSTRACT
Machine learning is a technology which allows a software program to become more
accurate at pretending more accurate results without being explicitly programmed and also ML
algorithms uses historic data to predict the new outputs. Because of this ML gets a distinguish
attention. Now a days prediction engine has become so popular that they are generating accurate and
affordable predictions just like a human, and being using industry to solve many of the problems.
Predicting justified salary for employee is always being a challenging job for an employer. In this
project, proposing a salary prediction model with suitable algorithm using key features required to
predict the salary of employee. The goal of this paper is to predict salary of a person after a certain
year. The graphical representation of predicting salary is a process that aims for developing
computerized system to maintain all the daily work of salary growth graph in any field and can
predict salary after a certain time period.
v
CONTENTS
Description Page No
Acknowledgement i
VISION, MISSION, PO’s, PSO’s, PEO’s ii-iii
Course Outcomes iv
Abstract v
Chapter 1. Introduction
1.1 Industry 4.0 1
1.2 Cognitive Computing 1
1.3 AI VS ML VS DL 2
1.4 Machine Learning and its Types 3-4
1.5 Skills Required and Tools used in ML 5
1.6 Challenges in ML 5
Chapter 2. Data Preprocessing
2.1 Imputing 6
2.2 Encoding 6
2.3 Scaling 6
2.4 Transformation 7
2.5 Outlier Handling 7
2.6 Different Algorithms in ML and its use cases 8
2.7 Other terminologies used in ML 9-10
Chapter 3. Salary Prediction Project Using ML
3.1 Description 11
3.2 Problem Statement 11
3.3 Proposed System 11
3.4 Methodology Used 12
Chapter 4. Literature Survey 13
Chapter 5. Implementation
5.1 Source Code 14
5.2 Snapshots 15-18
Results 18
Learning Outcomes
Conclusion
References
Internship on Machine Learning
INTRODUCTION
CHAPTER 1
Before Industry 4.0, there were three prior industrial revolutions that have led to
changes of paradigm in the domain of manufacturing: mechanization through water and steam
power, mass production in assembly lines and automation using information technology.
Industry 1.0 began around the 1780s with the introduction of water and steam power which
helped in mechanical production and improved the agriculture sector greatly. Next, Industry
2.0 is defined as the period when mass production was introduced as the primary means to
production, in general. The mass production of steel helped introduce railways into the
industrial system which consequently contributed to mass production at large. During the 20th
century, Industry 3.0 arose with the advent of the Digital Revolution which is more familiar
compared to Industry 1.0 and 2.0 as most people living today are familiar with industries
leaning on digital technologies in production. Perhaps Industry 3.0 was and still is a direct
result of the huge development in computers and information and communication technology
industries for many countries (Liao et al., 2017).Industry 4.0 has brought change to many
professions. People have always been obligated to learn new everyday tasks but now are also
compelled to use hi-tech gadgets which are fast becoming the most important factor in their
working life (Gorecky et al., 2014).
Machine Learning: Machine Learning is basically the study/process which provides the
system to learn automatically on its own through experiences it had and improve accordingly
without being explicitly programmed.
Deep Learning: Deep Learning is basically a sub-part of the broader family of Machine
Learning which makes use of Neural Networks(similar to the neurons working in our brain) to
mimic human brain-like behaviour.
It means in the supervised learning technique, we train the machines using the "labelled"
dataset, and based on the training, the machine predicts the output. More preciously, we can
say; first, we train the machine with the input and corresponding output, and then we ask the
machine to predict theoutput using the test dataset.
Supervised machine learning can be classified into two types of problems, which are given
below:
• Classification
• Regression
Classification: The classification algorithms predict the categories present in the dataset.
Some real-world examples of classification algorithms are Spam Detection, Email filtering,
etc.
Regression: Regression algorithms are used to solve regression problems in which there is a
linear relationship between input and output variables. These are used to predict continuous
output variables, such as market trends, weather prediction, etc.
• Clustering
• Association
Clustering: The clustering technique is used when we want to find the inherent groups from
the data. It is a way to group the objects into a cluster such that the objects with the most
similarities remain in one group and have fewer or no similarities with the objects of other
groups. An example of the clustering algorithm is grouping the customers by their behavior.
ii. Computer Science Fundamentals and Programming: This is another basic requirement
for becoming a good machine learning engineer. You need to be familiar with different CS
concepts like data structures algorithms, space and time complexity, etc.
iii. Machine Learning Algorithms: ML algorithms are divided into 3 common types namely,
Supervised, Unsupervised, and Reinforcement Machine Learning Algorithms. In detail, some
of the common ones include: Naïve Bayes Classifier, K Means Clustering, Support Vector
Machine, Apriori Algorithm, Linear Regression, Logistic Regression, Decision Trees, Random
Forests, etc.
iv. Data Modeling and Evaluation: Data modeling involves understanding the underlying
structure of the data and then finding patterns that are not obvious to the naked eye. For
example, the type of machine learning algorithms to use such as regression, classification,
clustering, dimensionreduction, etc. depends on the data.
v. Neural Networks: These demonstrate a deep insight into parallel and sequential
computations that are used to analyze or learn from the data. There are many different types of
neural networks like Feedforward Neural Network, Recurrent Neural Network, Convolutional
Neural Network, Modular Neural Network, Radial basis function Neural Network, etc.
CHAPTER 2
Data Pre-processing
2.1 Imputing
These techniques are used because removing the data from the dataset every time is not feasible
and can lead to a reduction in the size of the datasetto a large extend, which not only raises
concerns for biasing the dataset but also leads to incorrect analysis. The next step of data
preprocessing is to handle missing data in the datasets. If our dataset contains some missing
data, then it may create a huge problem for our machine learning model. Hence it is necessary
to handle missing values present in the dataset. Ways to handle missing data:
2.2 Encoding
2.3 Scaling
It is a technique to standardize the independent variables of the dataset in a specific range. In
feature scaling, we put our variablesin the same range and in the same scale so that no any
variable dominate the other variable.
➢ Discrete
➢ Continuous
Linear Regression: Here, predictive analysis defines prediction of something, and linear
regression makes predictions for continuous numbers such as salary, age, etc. It shows the linear
relationship between the dependent and independent variables and shows how the dependent
variable(y) changes according to the independent variable (x).
Support Vector Machine: SVM Algorithm is a supervised learning algorithm, andthe way it
works is by classifying data sets into different classes through a hyperplane. It marginalizes the
classes and maximizes the distances between them to provide unique distinctions.
Decision Tree Machine Learning Algorithm: Applications of this Decision Tree Machine
Learning Algorithm range from data exploration, pattern recognition, option pricing in finances
and identifying disease and risk trends. Rather than processing multiple if and else condition
decision tree gives the optimal prediction.
K-Nearest Neighbors Algorithm: KNN is mainly used in market basket analysis or for retail
analytics. Finding a similar product which customer is likely to buy or put in the basket.
Prediction is done on the basis of similarity.
Random Forest ML Algorithm: The random forest algorithm is used in industrial applications
such as finding out whether a loan applicant is low-risk or high-risk, predicting the failure of
mechanical parts in automobile engines and predicting social media share scores and
performance scores.
The variance would specify the amount of variation in the prediction if the different training
data was used. In simple words, variance tells that how much a random variable is different
from its expected value. Variance errors are either of low variance or high variance. Low
variance means there is a small variation in the prediction of the target functionwith changes in
the training data set. At the same time, High variance shows a large variation in the prediction
of the target function with changes in the training dataset.
Regularization: Regularization is a technique used to reduce the errors by fitting the function
appropriately on the given training set and avoid overfitting.
• L1 regularization
• L2 regularization
• Dropout regularization
Performance Metrics in Machine Learning: Evaluating the performance of a Machine
learning model is one of the important steps while building an effective ML model. To evaluate
the performance or quality of the model, different metrics are used, and these metrics are known
as performance metrics or evaluation metrics.
Error analysis: The process to isolate, observe and diagnose erroneous ML predictions thereby
helping understand pockets of high and low performance of the model. When it is said that “the
model accuracy is 90%” it might not be uniform across subgroups of data and there might be
some input conditions which the model fails more. So, it is the next step from aggregate metrics
to a more in-depth review of model errors for improvement. An example might be that a dog
detection image recognition model might be doing better for dogs in an outdoor setting but not
so good in low-lit indoor settings. Of course, this might be due to skewed datasets and error
analysis helps identify if such cases impact model performance. The below illustration provides
a view of how moving from aggregate to group-wise errors provides a better picture of model
performance.
CHAPTER 3
Salary Prediction Using Machine Learning
3.1 Description
A prediction is sometimes, though not always, is based upon knowledge or experience.
Future events are not necessarily certain, thus confirmed exact data about the future is in many
cases are impossible, a prediction may be useful to help in preparing plans about probable
developments. In this project salary of an employee of an organization is to be predicted on
basis of previous salary growth rate. Here history of salary has been observed and then on basis
of that salary of a person after a certain period of time it can be calculated automatically. It
helps to see the growth of any field. It can produce a person’s salary by clustering and predict
the salary through the graph. Using linear regression and polynomial regression it makes a
graph. This graph helps to predict the salary for all years of experiences.
• Step 4: If the points are not in linear way then we use polynomial regression for curving
purpose. Through the clustering points we can make a smooth and curve path.
• Step 5: After then through the linear/polynomial graph through the x-y axis we can
predict salary.
• Step 6: Also, we predict a person on future salary position as per the graph goes. Only
take a particular person position, then the prediction answer be executed through the
help of the graph.
CHAPTER 4
LITERATURE SURVEY
1. Andreas Mullar, “Introduction to Machine Learning using Python: A guide for data
Scientist,” in O’Reilly Publisher, India.
2. S. Marsland, Machine learning: an algorithmic perspective. CRC press, 2015.
3. A. L. Buczak and E. Guven, “A survey of data mining and machine learning methods for
cyber security intrusion detection,” IEEE Communications Surveys & Tutorials, vol. 18, no. 2,
pp. 1153–1176, Oct., 2015.
4. Tzanis, George, et al. "Modern Applications of Machine Learning." Proceedings of the 1st
Annual SEERC Doctoral Student Conference–DSC. 2006.
5. Horvitz, Eric. "Machine learning, reasoning, and intelligence in daily life: Directions and
challenges." Proceedings of. Vol. 360. 2006.
6. Mitchell, Tom Michael. The discipline of machine learning. Carnegie Mellon University,
School of Computer Science, Machine Learning Department, 2006.
7. Arum, R. (1998). The effects of resources on vocational student educational outcomes:
Invested dollars or diverted dreams? Sociology of Education, 71, 130-151.
8. Lewis, C. D., 1982. Industrial and Business Forecasting Methods, London, Butterworths.
9. Susmita Ray," A Quick Review of Machine Learning Algorithms," 2019 International
Conference on Machine Learning, Big Data, Cloud and Parallel Computing (ComIT-Con),
India, 14th -16th Feb 2019.
10. Sananda Dutta, Airiddha Halder, Kousik Dasgupta,” Design of a novel Prediction Engine
for predicting suitable salary for a job” 2018 Fourth International Conference on Research in
Computational Intelligence and Communication Networks (ICRCICN).
11. Pornthep Khongchai, Pokpong Songmuang, “Improving Students’ Motivation to Study
using Salary Prediction System” 2016 13th International Joint Conference on Computer
Science and Software Engineering (JCSSE)
12. Phuwadol Viroonluecha, Thongchai Kaewkiriya,” Salary Predictor System for Thailand
Labour Workforce using Deep Learning” The 18th International Symposium on
Communications and Information Technologies (ISCIT 2018)
13. Mangui Wu, Shunmin Shu,” Top Management Salary, Stock Ratio and Firm Performance:
A Comparative Study of State-owned and Private Listed Companies in China”
CHAPTER 5
IMPLEMENTATION
5.1 Sourcecode
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
dataset = pd.read_csv('Salary.csv')
dataset
x=dataset.iloc[:,:1].values
y=dataset.iloc[:,1:].values
fig=plt.figure()
ax=fig.add_axes([0,0,1,1])
ax.scatter(x,y,color='r')
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2,random_state=0)
from sklearn.linear_model import LinearRegression
regressor=LinearRegression()
regressor.fit(x_train,y_train)
y_pred=regressor.predict(x_test)
y_pred
y_test
plt.scatter(x,y,color='r')
plt.plot(x,regressor.predict(x),color='blue')
from sklearn.preprocessing import PolynomialFeatures
poly=PolynomialFeatures(degree=2)
x_poly=poly.fit_transform(x)
regressor.fit(x_poly,y)
plt.scatter(x,y,color='r')
plt.plot(x,regressor.predict(poly.fit_transform(x)),color='blue')
y_pred=regressor.predict(poly.fit_transform(x))
y_pred
y
Results:
In this Python machine learning project, we built a predictive model and find out the Salary
Prediction of each Employee using various factors. We used a Linear regression for this and
made use of the sklearn library to prepare the dataset.
• Now we are able to analyze technical requirements and select the most
appropriatesolution.
• We are able to describe the impact of technology on society.
CONCLUSION
In this project, I proposed a salary prediction system by using a linear regression algorithm
with second order polynomial transformation. For the proper salary prediction, we found out most
relevant 5 features. The result of the system is calculated by suitable algorithmby comparing it with
another algorithms in terms of standard scores and curves like the classification accuracy,
theF1score, the ROC curve, the Precision-Recallcurve etc. We compared algorithms only for the
basic model which only two attributes. Moreover, we continued with basic model and found out
the most appropriate method to add more attribute and with highest accuracy of 76%.
REFERENCES
• https://www.analyticsvidhya.com/blog/2021/08/a-quick-guide-to-error-analysis-for-
machine-learning-classification-models/
• https://www.javatpoint.com/performance-metrics-in-machine-learning
• https://www.researchgate.net/publication/332440369
• https://pianalytix.com/salary-prediction-model-using-ml/
• https://insideaiml.com/blog/Project-4:-Prediction-of-salary-Based-on-years-of-
experience-1121