Weaponizing Data Science For Social Engineering

Weaponizing Data Science
for Social Engineering:

Automated E2E Spear Phishing on Twitter
John Seymour | Philip Tully

1 #SNAP_R
You care about phishing on social media
2 #SNAP_R
TL;DR
#SNAP_R Twitter Profiles
Social #SNAP_R
Network
Automated
Phishing with Phishing Offense
Reconnaissance
3 #SNAP_R
ISO: Demo Volunteers
Tweet%#SNAP_R(before%the%demo%
to%get%an%example%tweet!
4 #SNAP_R
#whoami
John Seymour Philip Tully

@_delta_zero @phtully
Data Scientist at ZeroFOX Senior Data Scientist at ZeroFOX

Ph.D. student at UMBC Ph.D. student at University of Edinburgh &
Royal Institute of Technology
Researches Malware Datasets Brain Modeling & Artificial Neural Nets
5 #SNAP_R
A Novel Phishing Campaign Design
High
Our$#SNAP_R Spear$Phishing
Fully+Automated Highly+Manual
>30%+Accuracy 45%+Accuracy
Success Rate
Phishing
Mostly+Automated
5?14%+Accuracy
Low
Low High
Level of Effort
6 #SNAP_R
Fooling Humans for 50 Years
1966: ELIZA Chatbot 2016: @TayandYou

! Joseph Weizenbaum, MIT ! Microsoft AI
! Parsing & keyword replacement ! Deep Neural Network
7 #SNAP_R
InfoSec ML Historically Prioritizes Defense
8 #SNAP_R
Machine Learning on Offense
Automated Target Discovery
Automated Social Spear Phishing
Evaluation and Metrics
Results and Demo
Wrap Up
9 #SNAP_R
Machine Learning on Offense

10 #SNAP_R
Why Twitter?
! Bot-friendly API
! Colloquial syntax
! Shortened links
! Trusting culture
! Incentivized data disclosure
11 #SNAP_R
Shoutout
Where(Do(the(Phishers(Live?(Collecting(Phishers(
Geographic(Locations(from(Automated(Honeypots(
Robbie(Gallagher
We’ve+taken+a+novel+approach+to+automating+the+determination+of+a+
phishers+geographic+location.+With+the+help+of+Markov+chains,+we+
craft+honeypot+responses+to+phishers’+emails+in+an+attempt+to+beat+
them+at+their+own+game.+We’ll+examine+the+underlying+concepts,+
implementation+of+the+system+and+reveal+some+results+from+our+
ongoing+experiment.
12 #SNAP_R
Techniques, Tactics and Procedures
! Our ML Tool...
! Shortens payload per unique user
Twitter Profiles ! Auto-tweets at irregular intervals
! Triages users wrt value/engagement
! Prepends tweets with @mention
#SNAP_R
! Obeys rate limits
Phishing Offense ! We added...

! Post non-phishing posts
! Build believable profile
13 #SNAP_R
Design Flow
is_target(user)
Twitter Profiles get_timeline(depth)
#SNAP_R
gen_markov_tweet() gen_nn_tweet()
Phishing Offense
schedule_tweet_and_sleep() post_tweet_and_sleep()
14 #SNAP_R
Automated Target Discovery

15 #SNAP_R
Triage of High Value Targets on Twitter
! Accessible personal info

! Historical profile posts
! Heterogeneous data
! Text, images, urls, stats, dates
16 #SNAP_R
Extracting Features from
GET users/lookup
! Engagement: following/followers
! #myFirstTweet
! Default settings
! Description content
! Account age
17 #SNAP_R
Clustering Predicts High Value Users
Eric+Schmidt Eric+Schmidt
18
18 #SNAP_R
Selecting the
Best
Clustering
Model
! Many algorithms
! Many hyperparameters
! Max avg. score [-1,..,1]
! 0.5-0.7 reasonable structure
19 #SNAP_R
Automated Social Spear Phishing

20 #SNAP_R
Recon and Footprinting for Profiling
! Compute histogram of tweet timings
(binsize = 1 hour)
! Random minute within max hour to tweet
! Bag of Words on timeline tweets
! Select most commonly occurring non-

stopword
! We seed the neural network with topics that

the user frequently posts about
21 #SNAP_R
Leveraging Markov Models
1
! Popular for text generation:
I
0.38
see /r/SubredditSimulator,
InfosecTalk TitleBot
don’t 0.62
! Calculates pairwise frequency of
1 tokens and uses that to generate
like 1 new ones
0.54 0.46
! Based on transition probabilities
ML infosec
1
! Trained using most recent posts on
1
the user’s timeline
.
22 #SNAP_R
Training a Recurrent Neural Network
! Hosted on Amazon EC2
! Trained on g2.2xlarge
instance (65¢ per hour)
! Ubuntu (ami-c79b7eac)
! Training set > 2M tweets
! Took 5.5 days to train

LSTM+=+Long+Short?Term+Memory ! 3 layers, ~500 units/layer
Illustration: Chris Olah (@ch402)
LSTMs: Hochreiter & Schmidhuber, 1997
23 #SNAP_R
Tradeoffs and Caveats
Model LSTM Markov Chain
Metric
Training Speed Days Seconds
Accuracy High Medium
Availability Public Public
Size Large Small
Caveats • Deeper representation of • Overfits to each user, can

natural language, generalizes create temporally irrelevant
well tweets
• Retraining required for new • Performs poorly on users with

languages few tweets
24 #SNAP_R
Language and Social Network Agnosticism
! Markov models only use content on user’s timeline, which
means they can automatically generate content in other
languages
! For neural nets, you’d only need to scrape data from the target
language and retrain
! Both of these methods can also be applied to other social

networks
25 #SNAP_R
Evaluation and Metrics

26 #SNAP_R
Here’s a malicious URL...
27 #SNAP_R
And, apparently goo.gl lets us shorten it!
28 #SNAP_R
goo.gl also gives us analytics
29 #SNAP_R
Results and Demo

30 #SNAP_R
Wild Testing #SNAP_R
31 31
#SNAP_R
Pilot Experiment
! Via #SNAP_R we sent 90 “phishing” posts
out to people using #cat
! After 2 hours, we had 17% clickthrough rate
! After 2 days, we had between 30% and 66%
clickthrough rate
! Inside the Data

! goo.gl showed 27 clickthroughs (30%) came
from a t.co referrer
! Unknown referrers might be caused by bots
! With unique locations, clickthrough rate may be
as high as 66%
32 #SNAP_R
Man vs. Machine 2 Hour Bake Off
User Person SNAP_R

Metric
Total Targets ~200 819
Tweets/minute 1.67 6.85
Click-throughs 49 275
Observations • Copy/Pasting messages to • Arbitrarily scalable with the
different hashtags number of machines
33 #SNAP_R
DEMO of #SNAP_R
34 #SNAP_R
Wrap Up
35 #SNAP_R
Potential Use Cases
! Social media security awareness
Twitter Profiles ! Social media security education
#SNAP_R ! Automated internal pentesting
! Social engagement
Phishing Offense
! Staff Recruiting
36 #SNAP_R
! Of course, we’re white hats here…
Mitigations ! But machine learning is rapidly becoming
automated, so black hats would have this
capability soon.
! Protected accounts are immune to

Twitter Profiles timeline scraping, which defeats the tool
! Bots can be detected

#SNAP_R
! Standard mitigations apply:
! Don’t click on links from people you don’t
Phishing Offense know
! Report! Twitter is pretty good at flagging spam
accounts
! Maybe URL shorteners should be responsible
for malware?
37 #SNAP_R
Black Hat Sound Bytes
! Machine learning can be used

Twitter Profiles offensively to automate spear phishing
! Machine-generated grammar is bad, but

#SNAP_R
Twitter users DGAF
! Abundant personal data is publicly

accessible and effective for social
Phishing Offense engineering
38 #SNAP_R
?
39
John Seymour Philip Tully
@_delta_zero @phtully
We’ll also be at the booth

immediately after the presentation!
#SNAP_R

Weaponizing Data Science For Social Engineering

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Weaponizing Data Science For Social Engineering

Uploaded by

Copyright:

Available Formats

Weaponizing Data Science

for Social Engineering:

John Seymour | Philip Tully

John Seymour Philip Tully

Data Scientist at ZeroFOX Senior Data Scientist at ZeroFOX

Researches Malware Datasets Brain Modeling & Artificial Neural Nets

1966: ELIZA Chatbot 2016: @TayandYou