You are on page 1of 8

Business Intelligence and Analytics

Project Report

By: M.Vishnu (RA2111027020008)


CSE BDA-A
Jobs and Salaries in Data Science

Aim: To present Report on jobs and salaries in different countries by using Data Science.

Problem Statement:
Different industries require distinct skill sets within the data science domain. However, professionals
and employers may lack insights into these specific requirements, leading to misalignment in hiring
and career development efforts.

Feature Selection:
The Feature I have selected in this Data is Experience Level. This Classifies the professional
experience level of the employee. Common categories might include 'Entry-level', 'Mid-level',
'Senior', and 'Executive', providing insight into how experience influences salary in data-related roles.

Importing the libraries:


import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

Exploring the data:


set = r"C:\Users\M.V.SUBBARAO\Downloads\archive\jobs_in_data.csv"
Data = pd.read_csv(set)
print(Data.head())
Data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9355 entries, 0 to 9354
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 work_year 9355 non-null int64
1 job_title 9355 non-null object
2 job_category 9355 non-null object
3 salary_currency 9355 non-null object
4 salary 9355 non-null int64
5 salary_in_usd 9355 non-null int64
6 employee_residence 9355 non-null object
7 experience_level 9355 non-null object
8 employment_type 9355 non-null object
9 work_setting 9355 non-null object
10 company_location 9355 non-null object
11 company_size 9355 non-null object
dtypes: int64(3), object(9)
memory usage: 877.2+ KB

Data Cleaning:

The Salary in Usd is enough for the analysis

Data.drop(columns=["salary_currency","salary"],inplace=True)

counts = Data["company_location"].value_counts()
filtered_counts = counts[counts > 20].to_frame()
filtered_counts
As you can see the first location's count is 20 times the second one so we gonna
focus just on the US

Data=Data[Data["company_location"]=='United States']

Data.drop("company_location",inplace=True,axis=1)

Data.head()

Data visualization:

The categories

plt.figure(figsize=(15,8))
Data["job_category"].value_counts().plot(kind="bar", color='#00E5E5')

<Axes: xlabel='job_category'>
The job Titles:
plt.figure(figsize=(15,8))
job_title_counts = Data["job_title"].value_counts()
Data2 = Data[Data["job_title"].isin(job_title_counts[job_title_counts >
20].index)]
Data2["job_title"].value_counts().plot(kind="bar",color='#23CE6B').set_ylabel(
"Count")

Text(0, 0.5, 'Count')

Work settings and Experience levels Comparison

plt.figure(figsize=(20,8))

ax1 = plt.subplot2grid((2,4), (0,0))


ax2 = plt.subplot2grid((2,4), (0,1))
ax3 = plt.subplot2grid((2,4), (0,2))
ax4 = plt.subplot2grid((2,4), (0,3))
ax5 = plt.subplot2grid((2,4), (1,0),colspan=2)
ax6 = plt.subplot2grid((2,4), (1,2),colspan=2)
ax1.hist(Data[Data["experience_level"] ==
"Senior"]["work_setting"],color='#00FFFF')
ax1.set_title("Senior")
ax3.hist(Data[Data["experience_level"] == "Entry-
level"]["work_setting"],color='#00FFFF')
ax3.set_title("Entry-level")
ax2.hist(Data[Data["experience_level"] == "Mid-
level"]["work_setting"],color='#00FFFF')
ax2.set_title("Mid-level")
ax4.hist(Data[Data["experience_level"] ==
"Executive"]["work_setting"],color='#00FFFF')
ax4.set_title("Executive")
ax5.hist(Data["work_setting"],color='#00FFFF')
ax5.set_title("All work settings")
ax6.hist(Data["experience_level"],color='#00FFFF')
ax6.set_title("All experience levels")

plt.show()

Work year and company size

plt.figure(figsize=(16,5))
A = plt.subplot2grid((1,2), (0,0))
B = plt.subplot2grid((1,2), (0,1))

A.hist(Data["work_year"],color='#00FFFF')
A.set_title("Work year count")
A.set_xticks([2020,2021,2022,2023])
B.hist(Data["company_size"],color='#00FFFF')
B.set_title("Company size count")

plt.show()
Data.head()

Result:
Thus, I successfully implemented project report on jobs and salaries of different countries on data
science.

You might also like