Salary Scraping Project - SOW

Purpose of This Document
This document will explain what data to scrape from each site for this project
Scraper Requirements
Please follow the below guidelines for the scraper:

1. The scraper should be written in Python or Javascript
2. If you need to use a proxy for scraping, use Scrapingbee (API key will be provided to
you). If you have another recommendation for a scraping proxy, please advise your
preference.
3. The scraper should run on Mac & Ubuntu Linux and usage instructions must be provided
for our experienced engineers to use
4. Payment will be approved once the scraper successfully runs in both required
environments
The scraper should output two spreadsheets in CSV format - please see sample output:
5. salary_information.csv = Salary information from the requested websites for 5 desired
cyber security positions
6. listed_positions.csv = A list of all job titles and page URLs in all requested websites
Application Response
To apply for this project, please provide answers to the questions listed in the attached
application_response.csv file
Glassdoor Scraping
Job Titles & Pages to Scrape
Canonical Job Title Job Title in Site + Link
Cloud Security Engineer Cloud Security Engineer
Security Engineer Security Engineer
CISO Chief Information Security Officer

Cloud Security Architect Cloud Security Architect
Cloud Security Analyst Cloud Security Analyst
Data To Scrape
Data Exists in this site? Format for output CSV
Update Date Y (Jun 14, 2023) Format as 2023-06-14
Level of Confidence Y (“Confident”) Format as “Confident”
Avg Total Pay Y ($120,976) Format as 120976
Avg Base Pay Y ($101,179) Format as 101179
Avg Additional Pay Y ($19,976) Format as 19976
Value of Benefits N N/A
Min Possible Y ($78K) Format as 78000
Min Likely Y ($96K) Format as 96000
Max Likely Y ($154K) Format as 154000
Max Possible Y(189K) Format as 189000

How to apply and log filters
1. First, scrape the general data without any search filters.

a. Log general details in a row in salary_information.csv with filter type = “All”
2. Change one filter at a time
a. Once changed, hit search if needed
b. Log the new values with this filter in a new row in in salary_information.csv with
the filter type and value as instructed below.
3. Do NOT apply a combination of filters
Filter Exists in this site? Format for output CSV
(no filter applied) Yes Filter Type: “All”

Filter Value: “”
By Industry Yes Filter Type: “Industry”
Filter Value: “Legal” (iterate
all possible values)
By years of experience Yes Filter Type: “Experience”

Filter Value: “0-1” (iterate all
possible values)
By State Yes Filter Type: “State”

Filter Value: “CA, USA” (Use
2-letter state code, iterate all
51 US states + District of
Columbia = “DC”)
Filters to use:
Salary.com Scraping
Security Engineer Software Security Engineer

CISO Chief Information Security Officer
Cloud Security Architect Security Architect
Cloud Security Analyst Information Security Analyst I
Data To Scrape
Update Date Y (June 26, 2023) Format as 2023-06-26
Level of Confidence N/A N/A
Avg Total Pay Y - “Median Salary” + Format as 106358

“Median Bonus” ($101,100 +
$5,258 = $106,358)
Avg Base Pay Y - “Median Salary” Format as 101100

($101,100)
Avg Additional Pay Y - “Median Bonus” - shown Format as 5258

in benefits table ($5,258)
Value of Benefits N/A N/A
Min Possible Y ($81,269) - take from salary Format as 81269

+ bonus table
Min Likely Y ($93,226) - take from salary Format as 93226

+ bonus table
Max Likely Y ($122,919) - take from Format as 122919

salary + bonus table
Max Possible Y($137,993K) - take from Format as 137993

salary + bonus table

Filter Availability for This Site Format for output CSV

By Industry No None
By years of experience Yes Filter Type: “Experience”

Filter Value: “<1” (iterate all
possible values)

Filters to use:
Talent.com Scraping
Security Engineer Security Engineer
CISO Ciso
Cloud Security Architect Cloud Security Architect
Cloud Security Analyst Cloud Security Analyst
Data To Scrape

Update Date N (Just says 2023) Use current date of scraping,
format as 2023-06-26
Level of Confidence Y - “Based on 1820 salaries” Format as “1820 salaries”
Avg Total Pay Y - ($157,528) Format as 157528
Avg Base Pay N/A N/A
Avg Additional Pay N/A N/A
Value of Benefits N/A N/A
Min Possible Y ($138,666) - shown as Format as 138666

“Low”
Min Likely N/A N/A
Max Likely N/A N/A
Max Possible Y ($190,000) - shown as Format as 190000

“High”

Filter Availability for This Site Format for output CSV

By Industry No None
By years of experience No None

Note: Links are clickable and

NOT all states are shown.
Iterate state by state and
navigate to the new pages
to scrape them.
Filters to use:
Listed Positions
A second scraper should be built to get a list of all the positions available on all the requested
sites. Please see the requested output format attached.
NOTE: Some sites have two types of pages - pages with “full” data and other pages with
“partial” data. For each page in each website, your list should indicate the type of page.

Salary Scraping Project - SOW

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Salary Scraping Project - SOW

Uploaded by

Copyright:

Available Formats

Purpose of This Document

Please follow the below guidelines for the scraper:

Job Titles & Pages to Scrape

Canonical Job Title Job Title in Site + Link

Cloud Security Engineer Cloud Security Engineer

Security Engineer Security Engineer

CISO Chief Information Security Officer

Cloud Security Analyst Cloud Security Analyst

Data Exists in this site? Format for output CSV

Update Date Y (Jun 14, 2023) Format as 2023-06-14

Level of Confidence Y (“Confident”) Format as “Confident”

Avg Total Pay Y ($120,976) Format as 120976

Avg Base Pay Y ($101,179) Format as 101179

Avg Additional Pay Y ($19,976) Format as 19976

Value of Benefits N N/A

Min Possible Y ($78K) Format as 78000

Min Likely Y ($96K) Format as 96000

Max Likely Y ($154K) Format as 154000

Max Possible Y(189K) Format as 189000

1. First, scrape the general data without any search filters.

Filter Exists in this site? Format for output CSV

(no filter applied) Yes Filter Type: “All”

By years of experience Yes Filter Type: “Experience”

By State Yes Filter Type: “State”

Job Titles & Pages to Scrape

Canonical Job Title Job Title in Site + Link

Cloud Security Engineer Cloud Security Engineer

Security Engineer Software Security Engineer

Cloud Security Architect Security Architect

Cloud Security Analyst Information Security Analyst I

Data Exists in this site? Format for output CSV

Update Date Y (June 26, 2023) Format as 2023-06-26

Level of Confidence N/A N/A

Avg Total Pay Y - “Median Salary” + Format as 106358

Avg Base Pay Y - “Median Salary” Format as 101100

Avg Additional Pay Y - “Median Bonus” - shown Format as 5258

Value of Benefits N/A N/A

Min Possible Y ($81,269) - take from salary Format as 81269

Min Likely Y ($93,226) - take from salary Format as 93226

Max Likely Y ($122,919) - take from Format as 122919

Max Possible Y($137,993K) - take from Format as 137993

4. First, scrape the general data without any search filters.

Filter Availability for This Site Format for output CSV

(no filter applied) Yes Filter Type: “All”

By years of experience Yes Filter Type: “Experience”

By State Yes Filter Type: “State”

Job Titles & Pages to Scrape

Canonical Job Title Job Title in Site + Link

Cloud Security Engineer Cloud Security Engineer

Security Engineer Security Engineer

Cloud Security Architect Cloud Security Architect

Cloud Security Analyst Cloud Security Analyst

Data Exists in this site? Format for output CSV

Level of Confidence Y - “Based on 1820 salaries” Format as “1820 salaries”

Avg Total Pay Y - ($157,528) Format as 157528

Avg Base Pay N/A N/A

Avg Additional Pay N/A N/A

Value of Benefits N/A N/A

Min Possible Y ($138,666) - shown as Format as 138666

Min Likely N/A N/A

Max Likely N/A N/A

Max Possible Y ($190,000) - shown as Format as 190000

7. First, scrape the general data without any search filters.