You are on page 1of 7

Hinge/Bumble Data Scrape

Brief: Hinge and Bumble are mobile dating apps with over 1 million and 5 million users,
respectively. Compared to other dating services, the key benefit of Hinge and Bumble is the
visibility and customizability of data. Users can see many more features about their potential
matches than sites like Tinder, lending itself to great opportunities for data analysis.

End Product Goal: We want to create a website where a user can log into an API which triggers
a script to scrape their Bumble account for various information (provided below). This data is fed
into a pipeline, which uses ML models and various visualizations to convey insights into their
dating life. A similar product is at this link.

User Journey:
(1) User visits our website
(2) Using API we log into www.bumble.com using their credentials
(3) Scrape information below from their profile
(4) Store user data and display outputs of models and visualization using their data (THIS IS
NOT IN SCOPE OF PROJECT...I WILL COMPLETE THIS MYSELF)

Data Needed: There are three main sections of data found on three different webpages within
the website after you log in.
(1) User Data
(2) Preference Data
(3) Interaction Data

User Data ⇒ (https://bumble.com/app/edit-profile)


1) Main profile photo
2) Secondary profile photos

Users can have up to 3 Questions and 3 Answers. I would like to extract these separately such
that the final output has values/columns for Q1, A1, Q2, A2, Q3, A3. If a user has fewer than 3
Questions, Q3 and A3 would be empty/null.
This is their About Me section. We can scrape this entire text blurb as is.

1) Job history. Note that there can be multiple jobs for one user. We can scrape the most
recent or top n jobs...whatever is easiest
2) Education history. Same as job history in terms of what to keep
1) All of these should be their own columns/values. If there is no answer (i.e. there is a ‘+”
in the box), then we should mark it as null.
2) Gender is a mandatory response
3) These aren’t mandatory, and they only have one response (unlike Job or Education)

Preference Data ⇒ (https://bumble.com/app/settings)

1) Need the date mode and boolean for whether they are snoozed or not (snoozed = inactive)
2) This filter tells what types of dates this person is looking for?
3) Age Range (need two values...lower bound and upper bound)
4) Distance (maximum distance)
1) This tells us if the profile is a free or premium profile. Need to scrape if this text is there
or not → if it says we have “free filters remaining” then we have a free account
2) Boolean for whether the filters are being applied
3) The actual filtered that are used (for free accounts only up to 2 of these filters can be
used...the rest should be marked null)

Interaction Data ⇒ (https://bumble.com/app/connections)

1) Need the chat message, date and time (if possible) for each message sent and received for
each match
2) About Me section.
3) These correspond to the long list of attributes about a user. Some are left out so we can
leave them as null.
4) We also need the primary profile photo and secondary profile photos of the users we
interact with
5) We want to grab all text on the right side of the screen which correspond to the same
information for the user
a) Question and Answer (i.e. Q1: If I could have a superpower it'd be…)
b) Location
c) Anything else

** This if course needs to be repeated for every match on the page **

Data Schema

- dict_user: dictionary containing all scraped USER DATA


- dict_preferences: dictionary containing all scraped PREFERENCE DATA
- Tbl_messages: contains a match_id, time stamp, all messages sent and received with
labels (i.e. MESSAGE DATA)
- Tbl_match: contains match_id and demographic information about the person (name,
height, age, etc) i.e. MATCH DATA
______________________

User Data:
- My Photos & Videos
- My Answers
- Prompts, Answers
- My Virtues
- Work, Job Title, School, Education Level, Religious Beliefs, Hometown, Politics
- My Vitals
- Name, Gender, Pronouns, Sexuality, Age, Height, Location, Ethnicity, Children,
Family Plans, Covid Vaccine
- My Vices
- Habits for Drinking, Smoking, Marijuana, Drugs
- My Accounts
- Linked Instagram account
Preference Data

- Basic Preferences
- I’m Interested In
- My Neighborhood
- Member Preferences
- Age Range (+ Dealbreaker Boolean)
- Maximum Distance (+ Dealbreaker Boolean)
- Ethnicity (+ Dealbreaker Boolean)
- Religion (+ Dealbreaker Boolean)
- Preferred Preferences (Only for Preferred Member, Dealbreaker Boolean for all)
- Height, Children, Family Plans, Education Level, Politics, Drinking, Smoking,
Marijuana, Drugs

Interaction Data
- Likes You → User profiles who liked you
- Split by whether you match with them or not
- Which prompt or photo was liked
- Photos downloaded
- You Like → User profiles who you liked
- Splitby whether they match with you or not
- Photos downloaded
- Message Data
- Every message sent between you and users you match or like

You might also like