You are on page 1of 31

VISVESVARAYA TECHNOLOGICAL UNIVERSITY

BELAGAVI – 590018, KARNATAKA.

An Internship Report
On

“FORCASTING PRICE OF CRUDE OIL USING LONG


SHORT TERM MEMORY(LSTM)BASED ON RNN”
Submitted By
Prajwal K
(4BD17IS062)

Internship carried out at Rove Labs Pvt Ltd, Bengaluru.

Internal Guide External Guide


Mr. Puneeth S P M Tech Mr. Shreyas Vernekar
Assistant Professor Program Director
Dept of Information Science & Engg Rove Labs Pvt Ltd

2021-2022
Department of Information Science and Engineering
Bapuji Institute of Engineering and Technology
Davangere – 577004, Karnataka
Bapuji Institute of Engineering and Technology,
Davangere – 577004, Karnataka.

Department of Information Science andEngineering

CERTIFICATE

This is to certify that Mr. PRAJWAL K bearing USN 4BD17IS062 of Information Science and
Engineering department has satisfactorily submitted internship report entitled “FORCASTING
PRICE OF CRUDEOIL USING LONG SHOR TERM MEMORY(LSTM)BASED ON RNN”.
The internship report has been approved as it satisfies the academic requirement with respect to the
Internship work prescribed for Bachelor of Engineering Degree of the Visvesvaraya Technological
University, Belagavi, during the year 2021-22.

Guide Internship coordinator HOD

Mr. Puneeth S P M Tech Mr. Sheik Imran M.Tech. Dr. Poornima B Ph.D.
Assistant professor Assistant professor Professor & Head

External Viva

Name of the Examiners Signature with Date

1.

2.
Rove Labs Pvt Ltd
+91-9591071117|office@rovelabs.com
www.rovelabs.com
CIN: U74999KA2017PTC100899
GSTIN: 29AAICR1415R1ZT

INTERNSHIP COMPLETION CERTIFICATE

DATE 10th December 2021


USN 4BD17IS062
CERTIFICATE ID RLML951
ISSUED BY Sumanth

This is to certify that, PRAJWAL K student of Department of Information Science and


Engineering from Bapuji Institute of Engineering and Technology (BIET), Davangere has
completed the internship program in Rove Labs Pvt Ltd, Bengaluru, from 23/08/2021 to
08/10/2021 under the mentorship of Shreyas Vernekar.
During the above tenure, she was trained and assigned project work on Machine Learning and
Data Science. For which she has submitted the report and it is evaluated. Her work was graded
as good by the mentor.
We have found her to be dedicated and hardworking individual, we wish her best of luck for
the future endeavors.

ROVE LABS PVT LTD, BENGALURU

SHREYAS S VERNEKAR
Program Director
COMPANY PROFILE

Rove Labs is a visionary ed-tech company with a mission to power your career through latest
technology education and upgrade today’s younger generation with 21st century skills. With
our quality curriculum and powerful digital platform now, you can learn from anywhere and
anytime. We have trained more than 8 thousand people. Technology and how we work today
is changing so rapidly that it is difficult to find employees with experience in emerging
technologies and work practices. This skill gap is threatening the sustainability of enterprises
around the world. There is a strong need to revamp the education to employment and
employment to talent creation through carefully designed curriculum, crafted to suite the future
technology. Our industry driven curriculum, robust career guidance services and capable
community of Universities, Institutions, Instructors and Employers enable learners with skills
for careers in the 21st century.

Company mission: To promote academic conversations or dialogue that foster creativity and
integrate critical thinking skills within and across all content areas. Our aim is to provide
students with repeated opportunities to practice higher order thinking and establish safe,
intellectually risk-free learning environments and resources. Rove Labs consistently cultivate
problem solving and logical thinking capabilities in students and develop communication &
collaboration within by sharing ideas and working together.
ACKNOWLEDGEMENT

Salutation to our beloved and highly esteemed institute "Bapuji Institute of Engineering and
Technology." For having well qualified staff and labs furnished with necessary equipment.

I express my sincere thanks to our internal guide Mr. Puneeth S P M Tech. external guide Mr.
Shreyas Vernekar and Internship coordinator Mr. Sheik Imran. for giving constant encouragement,
support and guidance throughout the course of the Internship Seminar, without whose stable guidance
this seminar report would not have been achieved.

I express whole hearted gratitude to Dr. Poornima B. H.O.D. of IS & E. I wish to acknowledge
her help who made my task easy by providing with her valuable help and encouragement.

I would like to thank our beloved Principal Dr. H.B. Aravind and the Director Prof. Y
Vrushabhendrappa of this Institute for giving me the opportunity and guidance to work for the
Internship Seminar.

I would like to extend my gratitude to all teaching and non-teaching staff of Department of
Information Science and Engineering for the help and support rendered to me. I have benefited a lot
from the feedback, suggestions given by them.

I would like to extend my gratitude to all my family members and friends especially for their
advice and moral support.

PRAJWAL K
4BD17IS062

i
Vision and Mission of the Department
Vision

“To be the center of excellence by adopting technological innovation in academics and research to
develop competent man power for emerging needs of society and industries.”

Mission

M1: To provide quality education to meet the challenges of technological changes to succeed in their
professional career and higher education.

M2: To inculcate the culture of research, innovation and entrepreneur skills among the students.
M3: To groom our students with the quality of team spirit, leadership skills and ethical values, to share
and apply their knowledge for the benefit of the society.

Program Specific outcomes (PSOs):


PSO1- Problem Solving Skills - Ability to apply standard principles and practices of
Information Technology to propose feasible ideas and solutions to computational tasks
using appropriate tools and techniques.

PSO2- Knowledge of Information Technology – Analyze, design, develop and test the
computer based software in the areas related to networks, cloud computing,
web technology, data science and IoT.

PSO3- Profession and Research Ability – Inculcate the knowledge to excel in IT profession,
entrepreneurship and research with ethical standards.
.

Program Educational Objectives (PEOs):

PEO1- The graduates of program will have excellence through principles and practices of information
technology combined with fundamentals of engineering.

PEO2 - The graduates of program will be prepared in diverse areas of information science for their
successful careers, entrepreneurship and higher studies.

PEO3 - The graduates of program will work effectively as an individual and in a team, exhibiting
leadership qualities, communication skills to meet the goals of the organization.
PEO4 - The graduates of program will grove their profession with ethics, management principles to
carry societal responsibilities.
Program Outcomes (POs) defined by NBA:

PO1 - Engineering Knowledge: Apply the knowledge of mathematics, science,


engineering fundamentals, and an engineering specialization to the solution of
complex engineering problems.
PO2 - Problem Analysis: Identify, formulate, research literature, and analyze complex
engineering problems reaching substantiated conclusions using first principles of
mathematics, natural sciences, and engineering sciences.
PO3 - Design/development of Solutions: Design solutions for complex engineering
problems and design system components or processes that meet the specified needs
with appropriate consideration for public health and safety and the cultural,
societal and environmental considerations.
PO4 - Conduct Investigations of Complex Problems: Use research-based knowledge
and research Methods including design of experiments, analysis and interpretation
of data, and synthesis of the information to provide valid
conclusions.
PO5 - Modern Tool Usage: Create, select, and apply appropriate techniques, resources,
and modern engineering and IT tools including prediction and modelling to
complex engineering activities with an understanding of the
limitations.
PO6 - The Engineer and Society: Apply reasoning informed by the contextual
knowledge to assess societal, health, safety, legal and cultural issues and the
consequent responsibilities relevant to the professional engineering practice.
PO7 - Environment and Sustainability: Understand the impact of the professional
engineering solutions in societal and environmental contexts, and demonstrate
the knowledge of, and need for sustainable development.
PO8 - Ethics: Apply ethical principles and commit to professional ethics and
responsibilities and norms of engineering practice.
PO9 - Individual and Team Work: Function effectively as an individual, and as a
member or leader in diverse teams, and in multidisciplinary settings.
PO10 - Communication: Communicate effectively on complex engineering activities with
the engineering community and with society at large, such as, being able to
comprehend and write effective reports and design documentation, make
effective presentations, and give and receive clear instructions.
PO11 - Project Management and Finance: Demonstrate knowledge and understanding
of the engineering and management principles and apply these to one own work,
as a member and leader in a team, to manage projects and in
multidisciplinary environments.
PO12 - Life-long Learning: Recognize the need for, and have the preparation and ability
to engage in independent and life-long learning in the broadest context of
technological change.
Bapuji Educational Association. Regd.
Bapuji Institute of Engineering and Technology,
Davangere – 577004, Karnataka.
Department of Information Science andEngineering

Subject name/Code: Internship / Professional practice


Semester: 8th semester, EVEN 2021-2022.

COURSE OUTCOMES

At the end of Internship/Professional practice, students will be able to:

CO1: Adapt easilyto industryenvironment byworkingasateam on various


modern tools.

CO2: Explore career alternatives prior to graduation.

CO3: Develop Communication skills, professional skills & interpersonal


skills.

CO4: Acquire employment contacts leading directly to a full time job


following graduation from college.

CO5: Adapt ethical values and to lifelong learning process.

iv
ABSTRACT
Machine learning is a technology which allows a software program to become more
accurate at pretending more accurate results without being explicitly programmed and also ML
algorithms uses historic data to predict the new outputs. Because of this ML gets a distinguish
attention. Now a days prediction engine has become so popular that they are generating accurate and
affordable predictions just like a human, and being using industry to solve many of the problems.
Predicting justified salary for employee is always being a challenging job for an employer. In this
project, proposing a salary prediction model with suitable algorithm using key features required to
predict the salary of employee. The goal of this paper is to predict salary of a person after a certain
year. The graphical representation of predicting salary is a process that aims for developing
computerized system to maintain all the daily work of salary growth graph in any field and can
predict salary after a certain time period.

v
CONTENTS
Description Page No
Acknowledgement i
VISION, MISSION, PO’s, PSO’s, PEO’s ii-iii
Course Outcomes iv
Abstract v
Chapter 1. Introduction
1.1 Industry 4.0 1
1.2 Cognitive Computing 1
1.3 AI VS ML VS DL 2
1.4 Machine Learning and its Types 3-4
1.5 Skills Required and Tools used in ML 5
1.6 Challenges in ML 5
Chapter 2. Data Preprocessing
2.1 Imputing 6
2.2 Encoding 6
2.3 Scaling 6
2.4 Transformation 7
2.5 Outlier Handling 7
2.6 Different Algorithms in ML and its use cases 8
2.7 Other terminologies used in ML 9-10
Chapter 3. Salary Prediction Project Using ML
3.1 Description 11
3.2 Problem Statement 11
3.3 Proposed System 11
3.4 Methodology Used 12
Chapter 4. Literature Survey 13
Chapter 5. Implementation
5.1 Source Code 14
5.2 Snapshots 15-18
Results 18
Learning Outcomes
Conclusion
References
Internship on Machine Learning
INTRODUCTION
CHAPTER 1

1.1 Industry 4.0

Before Industry 4.0, there were three prior industrial revolutions that have led to
changes of paradigm in the domain of manufacturing: mechanization through water and steam
power, mass production in assembly lines and automation using information technology.
Industry 1.0 began around the 1780s with the introduction of water and steam power which
helped in mechanical production and improved the agriculture sector greatly. Next, Industry
2.0 is defined as the period when mass production was introduced as the primary means to
production, in general. The mass production of steel helped introduce railways into the
industrial system which consequently contributed to mass production at large. During the 20th
century, Industry 3.0 arose with the advent of the Digital Revolution which is more familiar
compared to Industry 1.0 and 2.0 as most people living today are familiar with industries
leaning on digital technologies in production. Perhaps Industry 3.0 was and still is a direct
result of the huge development in computers and information and communication technology
industries for many countries (Liao et al., 2017).Industry 4.0 has brought change to many
professions. People have always been obligated to learn new everyday tasks but now are also
compelled to use hi-tech gadgets which are fast becoming the most important factor in their
working life (Gorecky et al., 2014).

1.2 Cognitive Computing


In middle and later periods of the 20th century, the trend of behaviorism gradually
declined. The rapid development of linguistics, information theory and data science as well as
the popularization of computer technologies have brought an impressive and thought-
provoking cognitive revolution. Cognitive Science has emerged, which is an interdisciplinary
subject that studies the circulation and treatment of information in human brain. Cognitive
scientists explore mental ability of human beings through observation on aspects such as
language, perception, memory, attention, reasoning and emotion . The cognitive process of
human beings is mainly reflected on the following two stages.

Dept of IS&E, BIET, Davangere Page|1


Internship on Machine Learning
1.3 AI vs ML vs DL
In this new era of technology, companies and developers around the world are talking
about embracing artificial intelligence (AI), machine learning (ML), and deep learning (DL).
All these acronyms are often loosely used in the field of technology. It is important to
understand that all these acronyms are part of Artificial Intelligence (AI) umbrella.

Fig. 1: Representation of AI, ML & DL


Artificial Intelligence: Artificial Intelligence is basically the mechanism to incorporate human
intelligence into machines through a set of rules(algorithm). AI is a combination of two words:
“Artificial” meaning something made by humans or non-natural things and “Intelligence”
meaning the ability to understand or think accordingly.

Machine Learning: Machine Learning is basically the study/process which provides the
system to learn automatically on its own through experiences it had and improve accordingly
without being explicitly programmed.

Deep Learning: Deep Learning is basically a sub-part of the broader family of Machine
Learning which makes use of Neural Networks(similar to the neurons working in our brain) to
mimic human brain-like behaviour.

Dept of IS&E, BIET, Davangere Page|2


Internship on Machine Learning
1.4 Machine Learning and types

Fig. 2: Types of Machine Learning

1.4.1 Supervised Machine Learning

It means in the supervised learning technique, we train the machines using the "labelled"
dataset, and based on the training, the machine predicts the output. More preciously, we can
say; first, we train the machine with the input and corresponding output, and then we ask the
machine to predict theoutput using the test dataset.

Supervised machine learning can be classified into two types of problems, which are given
below:

• Classification

• Regression

Classification: The classification algorithms predict the categories present in the dataset.
Some real-world examples of classification algorithms are Spam Detection, Email filtering,
etc.

Regression: Regression algorithms are used to solve regression problems in which there is a
linear relationship between input and output variables. These are used to predict continuous
output variables, such as market trends, weather prediction, etc.

Dept of IS&E, BIET, Davangere Page|3


Internship on Machine Learning

1.4.2 Unsupervised Machine Learning


In unsupervised learning, the models are trained with the data that is neither classified nor
labelled, and the model acts on that data without any supervision. UnsupervisedLearning can
be further classified into two types, which are given below:

• Clustering
• Association
Clustering: The clustering technique is used when we want to find the inherent groups from
the data. It is a way to group the objects into a cluster such that the objects with the most
similarities remain in one group and have fewer or no similarities with the objects of other
groups. An example of the clustering algorithm is grouping the customers by their behavior.

Association: Association rule learning is an unsupervised learning technique, which finds


interesting relations among variables within a large dataset. The main aim of this learning
algorithm is to find the dependency of one data item on another data item and map those
variables accordingly so that it can generate maximum profit.
1.4.3 Semi-Supervised Machine Learning
The main aim of semi-supervised learning is to effectively use all the available data, rather than
only labelled data like in supervised learning. Initially, similar data is clustered along with an
unsupervised learning algorithm, and further, it helps to label the unlabeled data into labelled
data.
1.4.4 Reinforcement Learning
Reinforcement learning works on a feedback-based process, in which an AI agent (A software
component) automatically explore its surrounding by hitting & trail, taking action, learning
from experiences, and improving its performance.

Dept of IS&E, BIET, Davangere Page|4


Internship on Machine Learning

1.5 Skills Required and Tools used in ML


i. Applied Mathematics: Many of the ML algorithms are applications derivedfrom statistical
modeling procedures and so it’s very easy to understand them if you have a strong foundation
in Mathematics.

ii. Computer Science Fundamentals and Programming: This is another basic requirement
for becoming a good machine learning engineer. You need to be familiar with different CS
concepts like data structures algorithms, space and time complexity, etc.

iii. Machine Learning Algorithms: ML algorithms are divided into 3 common types namely,
Supervised, Unsupervised, and Reinforcement Machine Learning Algorithms. In detail, some
of the common ones include: Naïve Bayes Classifier, K Means Clustering, Support Vector
Machine, Apriori Algorithm, Linear Regression, Logistic Regression, Decision Trees, Random
Forests, etc.

iv. Data Modeling and Evaluation: Data modeling involves understanding the underlying
structure of the data and then finding patterns that are not obvious to the naked eye. For
example, the type of machine learning algorithms to use such as regression, classification,
clustering, dimensionreduction, etc. depends on the data.

v. Neural Networks: These demonstrate a deep insight into parallel and sequential
computations that are used to analyze or learn from the data. There are many different types of
neural networks like Feedforward Neural Network, Recurrent Neural Network, Convolutional
Neural Network, Modular Neural Network, Radial basis function Neural Network, etc.

1.6 Challenges in Machine Learning


There are a lot of challenges that machine learning professionals face to inculcate ML skills
and create an application from scratch. Some of them are:

• Poor Quality of Data


• Underfitting of Training Data
• Overfitting of Training Data
• Machine Learning is a Complex Process

Dept of IS&E, BIET, Davangere Page|5


Internship on Machine Learning

CHAPTER 2
Data Pre-processing
2.1 Imputing

These techniques are used because removing the data from the dataset every time is not feasible
and can lead to a reduction in the size of the datasetto a large extend, which not only raises
concerns for biasing the dataset but also leads to incorrect analysis. The next step of data
preprocessing is to handle missing data in the datasets. If our dataset contains some missing
data, then it may create a huge problem for our machine learning model. Hence it is necessary
to handle missing values present in the dataset. Ways to handle missing data:

2.2 Encoding

Fig. 3: Encoding Techniques

Encoding is a technique of converting categorical variables into numerical values so that it


could be easily fitted to a machine learning model.
Categorical features are generally divided into 3 types:
• Nominal
• Ordinal

2.3 Scaling
It is a technique to standardize the independent variables of the dataset in a specific range. In
feature scaling, we put our variablesin the same range and in the same scale so that no any
variable dominate the other variable.

Dept of IS&E, BIET, Davangere Page|6


Internship on Machine Learning
2.4 Transformation
Data transformation isthe process of converting raw data into a format or structure that would
be more suitable for model building and also data discovery in general. All machine learning
algorithms are basedon mathematics. So, we need to convert all the columns into numerical
format. Taking a broader perspective, data is classified into numerical and categorical data:
• Numerical: As the name suggests, this is numeric data that is quantifiable.
• Categorical: The data is a string or non-numeric data that is qualitative in nature.

Numerical data is further divided into the following:

➢ Discrete

➢ Continuous

Categorical data is further divided into the following:


➢ Ordered
➢ Nominal

2.5 Outlier Handling


Outliers are data points that is distant from the rest. They may be due to variability in the
measurement or may indicate experimental errors. If possible, outliers should be excluded from
the data set. However, detecting that anomalous instances might be difficult, and is not always
possible.
Three different methods of dealing with outliers:
• Univariate method: This method looks for data points with extreme values on one variable.
• Multivariate method: Here, we look for unusual combinations of all the variables.
• Minkowski error: This method reduces the contribution of potential outliers in the training
process.

Dept of IS&E, BIET, Davangere Page|7


Internship on Machine Learning

2.6 Different Algorithms in ML and its use cases


Naive Bayes Algorithm: Naive Bayes algorithm is a supervised learning algorithm, which is
based on Bayes theorem and used for solving classification problems. It is mainly used in text
classification that includes a high-dimensional training dataset. It is a probabilistic classifier,
whichmeans it predicts on the basis of the probability of an object.

Linear Regression: Here, predictive analysis defines prediction of something, and linear
regression makes predictions for continuous numbers such as salary, age, etc. It shows the linear
relationship between the dependent and independent variables and shows how the dependent
variable(y) changes according to the independent variable (x).

K-Means Clustering Algorithm: K-Means Clustering Algorithm is an unsupervised Machine


Learning Algorithm that is used in cluster analysis. It works by categorizing unstructured data
into a number of different groups 'k' being the number of groups.

Support Vector Machine: SVM Algorithm is a supervised learning algorithm, andthe way it
works is by classifying data sets into different classes through a hyperplane. It marginalizes the
classes and maximizes the distances between them to provide unique distinctions.
Decision Tree Machine Learning Algorithm: Applications of this Decision Tree Machine
Learning Algorithm range from data exploration, pattern recognition, option pricing in finances
and identifying disease and risk trends. Rather than processing multiple if and else condition
decision tree gives the optimal prediction.
K-Nearest Neighbors Algorithm: KNN is mainly used in market basket analysis or for retail
analytics. Finding a similar product which customer is likely to buy or put in the basket.
Prediction is done on the basis of similarity.
Random Forest ML Algorithm: The random forest algorithm is used in industrial applications
such as finding out whether a loan applicant is low-risk or high-risk, predicting the failure of
mechanical parts in automobile engines and predicting social media share scores and
performance scores.

Dept of IS&E, BIET, Davangere Page|8


Internship on Machine Learning
2.7 Other Terminologies Used in ML
Underfitting: A statistical model or a machine learning algorithm is said to have underfitting
when it cannot capture the underlying trend of the data. Underfitting destroys the accuracy of
our machine learning model.
Techniques to reduce underfitting:
• Increase model complexity
• Increase the number of features, performing feature engineering
• Remove noise from the data.
• Increase the number of epochs or increase the duration of training to get better results.
Overfitting: A statistical model is said to be overfitted when we train it with a lot of data.
When a model gets trained with so much data, it starts learning from the noise and
inaccurate data entries in our data set. Then the model does not categorize the data
correctly, because of too many details and noise.

Techniques to reduce overfitting:

• Increase training data.


• Reduce model complexity.
• Early stopping during the training phase (have an eye over the loss over the training
period as soon as loss begins to increase stop training).
• Ridge Regularization and Lasso Regularization.
• Use dropout for neural networks to tackle overfitting.
Bias and variance: Machine learning is a branch of Artificial Intelligence, which allows
machines to perform data analysis and make predictions. However, if the machine learning
model is not accurate, it can make predictions errors, and these prediction errors are usually
known as Bias and Variance. The main aim of ML/data science analysts is to reduce these errors
in order to get more accurate results. A model has either:
• Low Bias: A low bias model will make fewer assumptions about the form of the target
function.
• High Bias: A model with a high bias makes more assumptions, and the model becomes
unable to capture the important features of our dataset.

Dept of IS&E, BIET, Davangere Page|9


Internship on Machine Learning

The variance would specify the amount of variation in the prediction if the different training
data was used. In simple words, variance tells that how much a random variable is different
from its expected value. Variance errors are either of low variance or high variance. Low
variance means there is a small variation in the prediction of the target functionwith changes in
the training data set. At the same time, High variance shows a large variation in the prediction
of the target function with changes in the training dataset.

Regularization: Regularization is a technique used to reduce the errors by fitting the function
appropriately on the given training set and avoid overfitting.

The commonly used regularization techniques are:

• L1 regularization
• L2 regularization
• Dropout regularization
Performance Metrics in Machine Learning: Evaluating the performance of a Machine
learning model is one of the important steps while building an effective ML model. To evaluate
the performance or quality of the model, different metrics are used, and these metrics are known
as performance metrics or evaluation metrics.

Error analysis: The process to isolate, observe and diagnose erroneous ML predictions thereby
helping understand pockets of high and low performance of the model. When it is said that “the
model accuracy is 90%” it might not be uniform across subgroups of data and there might be
some input conditions which the model fails more. So, it is the next step from aggregate metrics
to a more in-depth review of model errors for improvement. An example might be that a dog
detection image recognition model might be doing better for dogs in an outdoor setting but not
so good in low-lit indoor settings. Of course, this might be due to skewed datasets and error
analysis helps identify if such cases impact model performance. The below illustration provides
a view of how moving from aggregate to group-wise errors provides a better picture of model
performance.

Dept of IS&E, BIET, Davangere P a g e | 10


Internship on Machine Learning

CHAPTER 3
Salary Prediction Using Machine Learning
3.1 Description
A prediction is sometimes, though not always, is based upon knowledge or experience.
Future events are not necessarily certain, thus confirmed exact data about the future is in many
cases are impossible, a prediction may be useful to help in preparing plans about probable
developments. In this project salary of an employee of an organization is to be predicted on
basis of previous salary growth rate. Here history of salary has been observed and then on basis
of that salary of a person after a certain period of time it can be calculated automatically. It
helps to see the growth of any field. It can produce a person’s salary by clustering and predict
the salary through the graph. Using linear regression and polynomial regression it makes a
graph. This graph helps to predict the salary for all years of experiences.

It will help the employee as per following ways:


• Helping to see the growth at any field.
• With the help of machine learning it can easily produce a graph.
• Marketing easy to estimate the salary between x-y axis.
• User can give any point to get the salary through the program.
• Salary of the employees can be observed to give them a particular field according to
their qualifications. The graphs through the Linear and polynomial graphs are displayed
to detect the salary.

3.2 Problem Statement


To build a machine learning model and predict the salary of the employees based on year of
experience. The goal of this paper is to predict salary of a person after a certain year.

3.3 Proposed System


To build a machine learning model and predict the salary of the employees based on year of
experience. The goal of this paper is to predict salary of a person after a certain year. The
graphical representation of predicting salary is a process that aims for developing
computerized system to maintain all the daily work of salary growth graph in any field and
can predict salary after a certain time period.

Dept of IS&E, BIET, Davangere P a g e | 11


Internship on Machine Learning

3.4 Methodology Used

Fig. 4: System Architectural Diagram


• Step 1: Salary data have been taken from dataset.
• Step 2: Then the points corresponding to the salary data of an individual person have
been plotted in the graph. The data are initialized in pandas (ascending, descending,
mixed-up). Taking the dataset from each pandas field and from the pandas dataset we
plotted the points on the graph as per number wise or input wise that came real dataset.
• Step 3: After that we using linear regression for draw lines between the points.

• Step 4: If the points are not in linear way then we use polynomial regression for curving
purpose. Through the clustering points we can make a smooth and curve path.
• Step 5: After then through the linear/polynomial graph through the x-y axis we can
predict salary.
• Step 6: Also, we predict a person on future salary position as per the graph goes. Only
take a particular person position, then the prediction answer be executed through the
help of the graph.

Dept of IS&E, BIET, Davangere P a g e | 12


Internship on Machine Learning

CHAPTER 4
LITERATURE SURVEY
1. Andreas Mullar, “Introduction to Machine Learning using Python: A guide for data
Scientist,” in O’Reilly Publisher, India.
2. S. Marsland, Machine learning: an algorithmic perspective. CRC press, 2015.
3. A. L. Buczak and E. Guven, “A survey of data mining and machine learning methods for
cyber security intrusion detection,” IEEE Communications Surveys & Tutorials, vol. 18, no. 2,
pp. 1153–1176, Oct., 2015.
4. Tzanis, George, et al. "Modern Applications of Machine Learning." Proceedings of the 1st
Annual SEERC Doctoral Student Conference–DSC. 2006.
5. Horvitz, Eric. "Machine learning, reasoning, and intelligence in daily life: Directions and
challenges." Proceedings of. Vol. 360. 2006.
6. Mitchell, Tom Michael. The discipline of machine learning. Carnegie Mellon University,
School of Computer Science, Machine Learning Department, 2006.
7. Arum, R. (1998). The effects of resources on vocational student educational outcomes:
Invested dollars or diverted dreams? Sociology of Education, 71, 130-151.
8. Lewis, C. D., 1982. Industrial and Business Forecasting Methods, London, Butterworths.
9. Susmita Ray," A Quick Review of Machine Learning Algorithms," 2019 International
Conference on Machine Learning, Big Data, Cloud and Parallel Computing (ComIT-Con),
India, 14th -16th Feb 2019.
10. Sananda Dutta, Airiddha Halder, Kousik Dasgupta,” Design of a novel Prediction Engine
for predicting suitable salary for a job” 2018 Fourth International Conference on Research in
Computational Intelligence and Communication Networks (ICRCICN).
11. Pornthep Khongchai, Pokpong Songmuang, “Improving Students’ Motivation to Study
using Salary Prediction System” 2016 13th International Joint Conference on Computer
Science and Software Engineering (JCSSE)
12. Phuwadol Viroonluecha, Thongchai Kaewkiriya,” Salary Predictor System for Thailand
Labour Workforce using Deep Learning” The 18th International Symposium on
Communications and Information Technologies (ISCIT 2018)
13. Mangui Wu, Shunmin Shu,” Top Management Salary, Stock Ratio and Firm Performance:
A Comparative Study of State-owned and Private Listed Companies in China”

Dept of IS&E, BIET, Davangere P a g e | 13


Internship on Machine Learning

CHAPTER 5
IMPLEMENTATION
5.1 Sourcecode

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
dataset = pd.read_csv('Salary.csv')
dataset
x=dataset.iloc[:,:1].values
y=dataset.iloc[:,1:].values
fig=plt.figure()
ax=fig.add_axes([0,0,1,1])
ax.scatter(x,y,color='r')
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2,random_state=0)
from sklearn.linear_model import LinearRegression
regressor=LinearRegression()
regressor.fit(x_train,y_train)
y_pred=regressor.predict(x_test)
y_pred
y_test
plt.scatter(x,y,color='r')
plt.plot(x,regressor.predict(x),color='blue')
from sklearn.preprocessing import PolynomialFeatures
poly=PolynomialFeatures(degree=2)
x_poly=poly.fit_transform(x)
regressor.fit(x_poly,y)
plt.scatter(x,y,color='r')
plt.plot(x,regressor.predict(poly.fit_transform(x)),color='blue')
y_pred=regressor.predict(poly.fit_transform(x))
y_pred
y

Dept of IS&E, BIET, Davangere P a g e | 14


Internship on Machine Learning
5.2 Snapshots

Fig. 5: Sample Dataset

Dept of IS&E, BIET, Davangere P a g e | 15


Internship on Machine Learning

Fig. 6: Graphical Representation of Dataset

Fig. 7: Prediction of Dataset

Fig. 8: Testing Dataset

Dept of IS&E, BIET, Davangere P a g e | 16


Internship on Machine Learning

Fig. 9: Results of Linear Regression for the Salary Dataset

Fig. 10: Results of Polynomial Regression for the Dataset

Dept of IS&E, BIET, Davangere P a g e | 17


Internship on Machine Learning

Fig. 11: Salary Prediction

Fig. 12: Salary Prediction

Results:
In this Python machine learning project, we built a predictive model and find out the Salary
Prediction of each Employee using various factors. We used a Linear regression for this and
made use of the sklearn library to prepare the dataset.

Dept of IS&E, BIET, Davangere P a g e | 18


Learning Outcomes
• Internship helped us for self-discovery, career exploration and
professionalpreparations.
• Learned to apply knowledge and skills relevant to area of study through
coworkersinteractions, group work and task assigned.
• To explain industrial training experiences using oral and written presentation skills.

• To demonstrate a professional attitude towards work and responsibility.

• Follow instruction to accomplish task by using proper tools and techniques.

• Taking initiative to solve problems and Enhancing observation power.

• Now we are able to analyze technical requirements and select the most
appropriatesolution.
• We are able to describe the impact of technology on society.
CONCLUSION
In this project, I proposed a salary prediction system by using a linear regression algorithm
with second order polynomial transformation. For the proper salary prediction, we found out most
relevant 5 features. The result of the system is calculated by suitable algorithmby comparing it with
another algorithms in terms of standard scores and curves like the classification accuracy,
theF1score, the ROC curve, the Precision-Recallcurve etc. We compared algorithms only for the
basic model which only two attributes. Moreover, we continued with basic model and found out
the most appropriate method to add more attribute and with highest accuracy of 76%.
REFERENCES
• https://www.analyticsvidhya.com/blog/2021/08/a-quick-guide-to-error-analysis-for-
machine-learning-classification-models/
• https://www.javatpoint.com/performance-metrics-in-machine-learning

• https://www.researchgate.net/publication/332440369

• https://pianalytix.com/salary-prediction-model-using-ml/

• https://insideaiml.com/blog/Project-4:-Prediction-of-salary-Based-on-years-of-
experience-1121

You might also like