Professional Documents
Culture Documents
Business Intelligence and Analytics
Business Intelligence and Analytics
Project Report
Aim: To present Report on jobs and salaries in different countries by using Data Science.
Problem Statement:
Different industries require distinct skill sets within the data science domain. However, professionals
and employers may lack insights into these specific requirements, leading to misalignment in hiring
and career development efforts.
Feature Selection:
The Feature I have selected in this Data is Experience Level. This Classifies the professional
experience level of the employee. Common categories might include 'Entry-level', 'Mid-level',
'Senior', and 'Executive', providing insight into how experience influences salary in data-related roles.
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9355 entries, 0 to 9354
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 work_year 9355 non-null int64
1 job_title 9355 non-null object
2 job_category 9355 non-null object
3 salary_currency 9355 non-null object
4 salary 9355 non-null int64
5 salary_in_usd 9355 non-null int64
6 employee_residence 9355 non-null object
7 experience_level 9355 non-null object
8 employment_type 9355 non-null object
9 work_setting 9355 non-null object
10 company_location 9355 non-null object
11 company_size 9355 non-null object
dtypes: int64(3), object(9)
memory usage: 877.2+ KB
Data Cleaning:
Data.drop(columns=["salary_currency","salary"],inplace=True)
counts = Data["company_location"].value_counts()
filtered_counts = counts[counts > 20].to_frame()
filtered_counts
As you can see the first location's count is 20 times the second one so we gonna
focus just on the US
Data=Data[Data["company_location"]=='United States']
Data.drop("company_location",inplace=True,axis=1)
Data.head()
Data visualization:
The categories
plt.figure(figsize=(15,8))
Data["job_category"].value_counts().plot(kind="bar", color='#00E5E5')
<Axes: xlabel='job_category'>
The job Titles:
plt.figure(figsize=(15,8))
job_title_counts = Data["job_title"].value_counts()
Data2 = Data[Data["job_title"].isin(job_title_counts[job_title_counts >
20].index)]
Data2["job_title"].value_counts().plot(kind="bar",color='#23CE6B').set_ylabel(
"Count")
plt.figure(figsize=(20,8))
plt.show()
plt.figure(figsize=(16,5))
A = plt.subplot2grid((1,2), (0,0))
B = plt.subplot2grid((1,2), (0,1))
A.hist(Data["work_year"],color='#00FFFF')
A.set_title("Work year count")
A.set_xticks([2020,2021,2022,2023])
B.hist(Data["company_size"],color='#00FFFF')
B.set_title("Company size count")
plt.show()
Data.head()
Result:
Thus, I successfully implemented project report on jobs and salaries of different countries on data
science.