You are on page 1of 25

Hunting Malicious TLS Certificates

With Deep Neural Networks


Real Time SSL & TLS Abuse Detector

David Camacho – Lead Data Architect


Alejandro Correa Bahnsen – VP. Research
Common Phishing Scams

2
3
What is a Web Certificate?
PhishingTLS Increment

Images from: https://www.thesslstore.com/blog/https-phishing-green-padlock/


How Do People Recognize a Safe Web Site?
Forrester survey asked users: “Some websites receive the following
browser user interface security indicator in the browser. What do
you think the security indicator is intended to tell users?”
Secure | https://ultrabank.com
The website is safe: 82%

The website is encrypted: 75%

The website is trustworthy: 66%

The website is private: 32%


Malware Abuse of TLS

Malware free
Malware Abuse of TLS

encrypted
We want to identify malicious
certificates in real time!
Can we detect a “bad” certificate on the fly?

Malware Cert
Hunter

Certificate + URL

Safe site
Can we detect a “bad” certificate on the fly?

Malware Cert
Hunter

Certificate + URL

Malicious
site
Hunting Malicious TLS Certificates with
Deep Neural Networks
Cert-HunterData
Data Collected:
• 1,000,000 of legitimate use certificates
• 5,000 of phishing use certificates
• 3,000 of malware use certificates
90%+ of TLS attacks use non-validated certificates
55% of legitimate businesses use non-validated TLS certificates but 100% of them use
real information
90% of malicious certificates contains commons names like:
• Example.com
• Localhost
• Domain.com
• localdomain
TLS Certificate Examples
Legitimate Certificates from Alexa Top Million
CN = *.stackexchange.com, O = Stack Exchange, Inc., L = New York, S = NY, C = US

Phishing Certificates from Phishtank


CN = localhost, L = Springfield

Malware Certificates from Abuse.ch &Censys.io & Rapid7


O=Dis, L=Springfield, S=Denial, C=US

14
Feature engineering
We created 40 features divided into 4 categories:

Boolean: Boolean matrix indicating which fields the certificate


has
SOC: Company’s SOC experience features.

Features inherited from previous work (last state of the


Prev_work: art)

Text: Statistical features extracted from subject and issuer


strings
LSTM
(Long-Short Term Memory)
RNN

• Excited
What comes next? Most probable
The dog is… • Hungry Hungry
• Green
• ….
Short term context • Affordable
RNN

• Excited
When it sees its owner What comes next? • Hungry Most probable
Hungry
the dog is.. • Green
• ….
Long term context
• Affordable
Short term context
LSTM

• Excited
When it sees its owner What comes next? • Hungry Most probable
Excited
the dog is.. • Green
• ….
Long term context
• Affordable
Short term context
Deep Learning Architecture
Subject Principal Issuer Principal Extracted Features

One hot One hot


encoding encoding

Embedding Embedding

LSTM LSTM Dense/ReLu

Dropout Dropout Dropout

Concatenate

Dense/ReLu

Dropout

Dense/Logit

score
Training process

lr=0.005
lr=0.005
Malicious Cert Classification Results (Phishing)

5-Fold CV Accuracy Recall Precision


Average 86.41% 83.20% 88.86%
Deviation 1.22% 3.29% 1.04%

25
Malicious Cert Classification Results (Malware)

5-Fold CV Accuracy Recall Precision


Average 94.65% 95.09% 94.28%
Deviation 0.09% 0.29% 0.11%

26
Takeaways
• It is possible to differentiate malicious certificates from
legitimate ones due to how attackers create their certificates.

• Attackers won’t expose themselves allowing exhaustive


validations

• Phishers are the most sophisticated attackers because the


want to look real.
Thanks
@luisdcamachog

luis.camacho@cyxtera.com

https://www.linkedin.com/in/luisdcamachog/

https://github.com/LuisDavidCamacho

You might also like