You are on page 1of 12

Purpose of This Document

This document will explain what data to scrape from each site for this project

Scraper Requirements

Please follow the below guidelines for the scraper:


1. The scraper should be written in Python or Javascript
2. If you need to use a proxy for scraping, use Scrapingbee (API key will be provided to
you). If you have another recommendation for a scraping proxy, please advise your
preference.
3. The scraper should run on Mac & Ubuntu Linux and usage instructions must be provided
for our experienced engineers to use
4. Payment will be approved once the scraper successfully runs in both required
environments

The scraper should output two spreadsheets in CSV format - please see sample output:
5. salary_information.csv = Salary information from the requested websites for 5 desired
cyber security positions
6. listed_positions.csv = A list of all job titles and page URLs in all requested websites

Application Response

To apply for this project, please provide answers to the questions listed in the attached
application_response.csv file

Glassdoor Scraping

Job Titles & Pages to Scrape

Canonical Job Title Job Title in Site + Link

Cloud Security Engineer Cloud Security Engineer

Security Engineer Security Engineer

CISO Chief Information Security Officer


Cloud Security Architect Cloud Security Architect

Cloud Security Analyst Cloud Security Analyst

Data To Scrape

Data Exists in this site? Format for output CSV

Update Date Y (Jun 14, 2023) Format as 2023-06-14

Level of Confidence Y (“Confident”) Format as “Confident”

Avg Total Pay Y ($120,976) Format as 120976

Avg Base Pay Y ($101,179) Format as 101179

Avg Additional Pay Y ($19,976) Format as 19976

Value of Benefits N N/A

Min Possible Y ($78K) Format as 78000

Min Likely Y ($96K) Format as 96000

Max Likely Y ($154K) Format as 154000

Max Possible Y(189K) Format as 189000


How to apply and log filters

1. First, scrape the general data without any search filters.


a. Log general details in a row in salary_information.csv with filter type = “All”
2. Change one filter at a time
a. Once changed, hit search if needed
b. Log the new values with this filter in a new row in in salary_information.csv with
the filter type and value as instructed below.
3. Do NOT apply a combination of filters

Filter Exists in this site? Format for output CSV

(no filter applied) Yes Filter Type: “All”


Filter Value: “”
By Industry Yes Filter Type: “Industry”
Filter Value: “Legal” (iterate
all possible values)

By years of experience Yes Filter Type: “Experience”


Filter Value: “0-1” (iterate all
possible values)

By State Yes Filter Type: “State”


Filter Value: “CA, USA” (Use
2-letter state code, iterate all
51 US states + District of
Columbia = “DC”)

Filters to use:

Salary.com Scraping

Job Titles & Pages to Scrape

Canonical Job Title Job Title in Site + Link

Cloud Security Engineer Cloud Security Engineer

Security Engineer Software Security Engineer


CISO Chief Information Security Officer

Cloud Security Architect Security Architect

Cloud Security Analyst Information Security Analyst I

Data To Scrape

Data Exists in this site? Format for output CSV

Update Date Y (June 26, 2023) Format as 2023-06-26

Level of Confidence N/A N/A

Avg Total Pay Y - “Median Salary” + Format as 106358


“Median Bonus” ($101,100 +
$5,258 = $106,358)

Avg Base Pay Y - “Median Salary” Format as 101100


($101,100)

Avg Additional Pay Y - “Median Bonus” - shown Format as 5258


in benefits table ($5,258)

Value of Benefits N/A N/A

Min Possible Y ($81,269) - take from salary Format as 81269


+ bonus table

Min Likely Y ($93,226) - take from salary Format as 93226


+ bonus table

Max Likely Y ($122,919) - take from Format as 122919


salary + bonus table

Max Possible Y($137,993K) - take from Format as 137993


salary + bonus table
How to apply and log filters

4. First, scrape the general data without any search filters.


a. Log general details in a row in salary_information.csv with filter type = “All”
5. Change one filter at a time
a. Once changed, hit search if needed
b. Log the new values with this filter in a new row in in salary_information.csv with
the filter type and value as instructed below.
6. Do NOT apply a combination of filters

Filter Availability for This Site Format for output CSV

(no filter applied) Yes Filter Type: “All”


Filter Value: “”

By Industry No None

By years of experience Yes Filter Type: “Experience”


Filter Value: “<1” (iterate all
possible values)

By State Yes Filter Type: “State”


Filter Value: “CA, USA” (Use
2-letter state code, iterate all
51 US states + District of
Columbia = “DC”)

Filters to use:
Talent.com Scraping

Job Titles & Pages to Scrape

Canonical Job Title Job Title in Site + Link

Cloud Security Engineer Cloud Security Engineer

Security Engineer Security Engineer

CISO Ciso

Cloud Security Architect Cloud Security Architect

Cloud Security Analyst Cloud Security Analyst

Data To Scrape

Data Exists in this site? Format for output CSV


Update Date N (Just says 2023) Use current date of scraping,
format as 2023-06-26

Level of Confidence Y - “Based on 1820 salaries” Format as “1820 salaries”

Avg Total Pay Y - ($157,528) Format as 157528

Avg Base Pay N/A N/A

Avg Additional Pay N/A N/A

Value of Benefits N/A N/A

Min Possible Y ($138,666) - shown as Format as 138666


“Low”

Min Likely N/A N/A

Max Likely N/A N/A

Max Possible Y ($190,000) - shown as Format as 190000


“High”
How to apply and log filters

7. First, scrape the general data without any search filters.


a. Log general details in a row in salary_information.csv with filter type = “All”
8. Change one filter at a time
a. Once changed, hit search if needed
b. Log the new values with this filter in a new row in in salary_information.csv with
the filter type and value as instructed below.
9. Do NOT apply a combination of filters
Filter Availability for This Site Format for output CSV

(no filter applied) Yes Filter Type: “All”


Filter Value: “”

By Industry No None

By years of experience No None

By State Yes Filter Type: “State”


Filter Value: “CA, USA” (Use
2-letter state code, iterate all
51 US states + District of
Columbia = “DC”)

Note: Links are clickable and


NOT all states are shown.
Iterate state by state and
navigate to the new pages
to scrape them.

Filters to use:

Listed Positions
A second scraper should be built to get a list of all the positions available on all the requested
sites. Please see the requested output format attached.

NOTE: Some sites have two types of pages - pages with “full” data and other pages with
“partial” data. For each page in each website, your list should indicate the type of page.

You might also like