You are on page 1of 56

INTERPOLIQUE

(OR, THE ONLY GOOD DEFENSE


IS THROUGH A BRUTAL
OFFENSE)
Dan Kaminsky, Chief Scientist
Recursion Ventures
dan@recursion.com
ANNOUNCEMENT
 This is my new company. Woot.
 Recursion productizes significant research
 It’stime to do things a little differently
 This talk isn’t a sales pitch for Recursion, but it’s an idea
regarding its philosophy
A STORY
 Design flaw in SSL
 The server thought it was resuming, the client thought it was
connecting
 Project Mogul spawned to fix it
 Severalmonths in deep secrecy
 Thousands of hours spent on IETF fix

 The fix broke <1% of servers


 No big deal, right?
REALITY
 “Note that to benefit from the fix for CVE-2009-3555
added in nss-3.12.6, Firefox 3.6 users will need to set
their security.ssl.require_safe_negotiation preference to
true. In Mandriva the default setting is false due to
problems with some common sites.” – Mandriva Patch
Notes
 They thought knocking out a few sites was acceptable for a
remediation
 They were wrong
THE BAD NEWS
 We give bad advice
 Pen testers are very good at breaking things
 Our “remediation” advice tends towards myopia
 We consider only our own engineering requirements
 We assume tools are static, and bash the craftsman
THE GOOD NEWS
 We are the keys to there actually being good advice
 We are the one community that actually knows how things
break
 We hold the knowledge to end the bugs we keep seeing
SESSION MANAGEMENT
A SIMPLE QUESTION
 When I log into two SSH servers, do I need to worry
about one accessing the other?
 No

 When I log into two web sites, do I need to worry about


one accessing the other?
 Yes

 Why?
 Because
SSH does not have totally broken session
management
SIMPLE THINGS, SIMPLY BROKEN
 The web was never designed to have authenticated
resources
 Auth was bolted on (because Basic/Digest never got fixed)
 Normal Mechanism For Managing Credentials
 Password causes Set-Cookie
 Cookie sent with each query to target domain
 Cookie is sent even with requests caused by third party
domains
 User’s credentials are mixed with attacker’s URL
 This is why most XSS/XSRF attacks are dangerous

 Cross Site Scripting and Cross Site Request Forgery wouldn’t be

nearly the big deal they are if they didn’t work cross site
THE PEN TESTER REACTION:
DEV, DO MORE WORK
 XSRF Tokens
 Manually add a token to every authenticated URL
 Requires touching everything in a web app that generates a
URL
 How’s that working out for us?
 This seems to be a lot of work
 If/when we come back six months later, it’s not usually done,
is it?
A MODEST PROPOSAL
 Couldn’t the tools be better?
 The big debate: Should SVGs animate?
 Unsaid: Shouldn’t it be possible to easily log into a web site
without other sites being able to use your creds?
AN ATTEMPT
 A fix that requires no change to the browser is better
 So I tried to find one
 Server Side Referrer Checking
 Client Side Referrer Checking

 Window.Name Checking

 Window.SessionStorage Checking

 It says SessionStorage! Surely it’s perfect for Session

Management!
 They all failed
 Thank you Cstone, Kuza55, Amit Klein, David Ross, SirDarckcat
WHEN FAILURE IS SUCCESS:
OUR PROBLEM WITH LATENCY
 My suggested defenses were defeated early in
development
 We, as a community, have a latency problem
 We don’t break during development
 We don’t break at release

 We don’t break when early adopters are deploying

 We break only when it gets really popular

 By then, it’s in customer hands, and the best we can do is give the

customers really expensive advice on how to fix it


 We need to close the feedback loop
AT MINIMUM
 Whatever’s going on with other defenses, I want mine to
be thoroughly, even brutally audited as soon as possible
 Life is too short to back broken code!
 Session Management will require modifications to the
browser
 Something else might not…
ON LANGUAGES
 "The bottom-line is that there just isn't a large
measurable difference in the security postures from
language to language or framework to framework --
specifically Microsoft ASP Classic, Microsoft .NET,
Java, Cold Fusion, PHP, and Perl. Sure in theory one
might be significantly more secure than the others, but
when deployed on the Web it's just not the case.”
--Jeremiah Grossman, CTO, White Hat Security (a
guy who has audited a lot of web applications)
 Question: Why aren’t the type safe languages safer
against web attack than the type unsafe languages?
WE AREN’T ACTUALLY USING THEM
 Reality of web development
 HTML and JavaScript and CSS and XML and SQL and
PHP and C# and…
 “On the web, every time you sneeze, you’re writing in a new
language”
 How do we communicate across all these languages?
 Strings

 And how type safe are strings?


 Not at all
ALL INJECTIONS ARE TYPE BUGS
 select count(*) from foo where x=‘x' or '1'='1';
 The C#/PHP/Java/Ruby sender thinks there’s a string there.
 The SQL receiver thinks there’s a string, a concatenator,
another string, and comparator, and another string there.
 The challenge: Maintaining type safety across language
boundaries
ISN’T THIS A SOLVED PROBLEM?
 Escaping?
 Parameterized Queries?
NO ESCAPE
 $conn->query(“select * from foo where x=\“$foo\”;”);
 Is this secure or not?
 Who knows, depends on whether $foo has been escaped
between when it first came in on the wire, and when it’s
being passed into the DB
 This simple line of code is expensive to debug!
 If somebody removes the escape(), the code still works
 “Fails open”
ACCIDENTAL ESCAPE
 What does it mean to escape?
 “Block Evil Characters”
 Was very easy to determine evil characters when we just had ASCII
 Only 256 possible bytes

 Unicode changes that

 Millions of characters

 All of which could mutate (“best fit match”) into one another

 All of which have multiple possible encodings, and

representations within encodings


 Escaping works by accident, without a solid contract
 Keeps getting updated
 escape(), escapeURI(), escapeURIComponent()
WHAT ABOUT PARAMETERIZED
QUERIES?
 Which would you rather write?
 $r = $m->query(“SELECT * from foo where
fname=‘$fname’ and lname=‘$lname’ and address=‘$address’
and city=‘$city’”);
 $p->prepare(“SELECT * from foo where fname=‘$fname’
and lname=‘$lname’ and address=‘$address’ and
city=‘$city’”);
$p->set(1, $fname);
$p->set(2, $lname);
$p->set(3, $address);
$p->set(4, $city);
$r = $m->queryPrepared($p);
REALITY OF PARAMETERIZED QUERIES
 No developer has ever written a parameterized query
without a gun to his head
 We should know
 We hold the gun
POSITIONAL GENERATION ISN’T ANY
BETTER (C/O MIKE SAMUEL)
O(N) UI WORK FAILS
(BEST CASE EYE TRACKING)
HOW INJECTIONS HAPPEN /
HOW DEVS LIKE TO WRITE CODE
 String Interpolation:
select count(*) from foo where x=‘$_GET[“foo”]';
 String Concatenation:
“select count(*) from foo where x=\”“ + $_GET[“foo”]
+ “\”;”;
 Why they write code this way
 Devs are thinking inline
 They want to be writing inline
 See: Fitts’ Law
IS IT POSSIBLE…
 …to let devs write inline code, without exposing the
resultant strings to injections?
 Yes – by making String Interpolation smarter
 RETAIN: The language still sees the boundary between the
environment(“select * from…”) and the variable ($_GET…).
 TRANSLATE: Given that metadata, the language can do smarter

things than just slap unprocessed strings together


 (This overlaps with, and extends, Mike Samuel’s
excellent “Secure String Interpolation” work, seen at
http://tinyurl.com/2lbrdy.)
 Working with Mike
INTERPOLIQUE DEMO [0]
INTERPOLIQUE DEMO[1]
INTERPOLIQUE DEMO[3]
INTERPOLIQUE DEMO[4]
INTERPOLIQUE DEMO[5]
 Submit
if($_POST[action] == "add"){
$conn->query(eval(b(
'insert into posts values(^^_POST[author] ,
^^_POST[content] );‘
)));}
 Return
$r = $conn->query("select * from posts");
while($row = $r->fetch_assoc()) { echo eval(sb(
'data: ^^row[author] ^^row[content]<br>\n‘
)); }
WHAT’S GOING ON
 Language interpolators are blind – they just push strings
into strings
 So we write custom interpolators – the dev puts in what he
wants, the compiler sees what it needs
WHAT TO INTERPOLATE INTO
 Parameterized Queries are an obvious target
 Programmer writes:
select * from table where fname=^^fname and
country=^^country and x=^^x;
 Interpolique expands:
$statement = $conn->prepare("select * from table where
fname=? and country=? and x=? ");
$statement->bind_param("s", $fname);
$statement->bind_param("s", $country);
$statement->bind_param("s", $x);
COULD DO ESCAPES…
 …but no faith they actually work correctly
BASE64: ESCAPING DONE RIGHT
 Programmer writes:
select * from table where fname=^^fname and
country=^^country and x=^^x;
 Interpolique expands:
select * from table where
fname=b64d("VEhJUyBJUyBUSEUgU1RPUlkgQUxMI
EFCT1VUIEhPVyBNWSBMSUZFIEdPVCBUVVJOR
UQgVVBTSURFIERPV04=") and
country=b64d("d2Fzc3Nzc3Nzc3Nzc3Nzc3NzdXA=")
and x=b64d("eXl5eXk=") ;
WHY THIS WORKS

 Type safe going into b64d() function


 That’s never getting interpreted as anything but a string
 Type safe coming out of b64d() function
 B64d() cast to return a string
 Not a subquery, not a conditional, not anything other than a
string
 B64d() a MySQL UDF that’s already written, has no apparent
time penalty, will be released with Interpolique
 Most other databases already have B64 support
 In a pinch, could use MySQL hex/unhex
TWO MODES OF BASE64
 Late binding
 Interpolation inserts the Base64 handler
 Text is plain until right before it crosses the frontend/backend layer
 SQL looks like this:
select * from foo where x=^^foo;
 Early Binding
 Base64 the variable as soon as it comes in off the HTTP request
 SQL looks like this:
select * from foo where x=b64d($foo);
 Pen testers: If somebody fails to escape $foo, everything
still works. If somebody fails to Base64 Encode $foo,
everything breaks immediately
STATIC ANALYSIS
 You know what’s better than having a static analyzer?
AHEM
 Not needing a static analyzer
BASE64 IN THE OTHER DIRECTION
 <span id=3520750
b64text="Zm9v">___</span><script>do_decode(35207
50)</script>
 Create a SPAN with a random ID and a dynamic attribute that
contains its base64’d content
 Call do_decode with that ID, which can now look up the
element in O(1) time
 Use this construction to retain streamability
 Thank/Blame CP for this
DOM INTERACTION: SIMPLE
 Push to textContent
 ob= document.getElementById(id); ob.textContent =
Base64.decode(ob.getAttribute("b64text"));
 We never go through the browser HTML parser
DOM INTERACTION: COMPLEX
 Push to appropriate createElements
 ob= document.getElementById(id);
raw = Base64.decode(ob.getAttribute("b64text")); safeParse(raw, ob);
 HTMLParser(src, {
start: function( tag, attrs, unary ) {

if(tag == "i" || tag == "b" || tag == "img" || tag == "a"){
el = document.createElement(tag);

Basic idea is to have a simple HTML parser that extracts what it can,
creates elements according to whitelisted rules, and importantly, never
goes through the browser HTML parser
 See also: “Blueprint”, a system that moves all DOM generation
to JS
 http://www.cs.uic.edu/~venkat/research/papers/blueprint-oakland09.pdf
IMPORTANT NOTE
 Security Is Quantized
 There’s a set of elements that can be safely exposed
 There’s a set that can’t
 The game is to expose only those tags and attributes that
don’t expand to arbitrary JS
 Either you have prevented wishing for more wishes, or you have
not
 (We see this from the webmail attack surface)
HOW THIS WORKS
 Primary Mechanism: Eval
 Yes, there’s risk here, and yes we’re going to talk about that risk –
we need this for scoping reasons
 Programmer written query: select * from table where
fname=^^fname and country=^^country and x=^^x;.
 To Eval: return ("select * from table where fname=b64d(\"" .
base64_encode($fname) . "\") and country=b64d(\"" .
base64_encode($country) . "\") and x=b64d(\"" . base64_encode($x)
. "\") ;");
 Eval Out: select * from table where
fname=b64d("VEhJUyBJUyBUSEUgU1RPUlkgQUxMIEFCT1VUI
EhPVyBNWSBMSUZFIEdPVCBUVVJORUQgVVBTSURFIERPV
04=") and country=b64d("d2Fzc3Nzc3Nzc3Nzc3Nzc3NzdXA=")
and x=b64d("eXl5eXk=")
CAN WE OPERATE WITHOUT EVAL?
 No Eval in Java or C#
 One approach: Combine variable argument functions
with string subclass tagging
 public bwrap w = new bwrap();
w.s(w.c("select * from foo where x="), argument1, w.c("and
y="), argument2);
 If you forget to mark the safe code, it breaks

 Another approach:
 w.code(“select* from foo where
x=“).data(argument1).code(“and y=“).data(argument2)
 Similar to LINQ etc. but actually works for arbitary grammars
 If you mismark code as data, or vice versa, it breaks
THE STATUS QUO
 We see this doesn’t work:
 String s = “select * from foo where x = \”“ + escape(s) +
“\”;”;
 By doesn’t work: It is too similar to this:
String s = “select * from foo where x = \”“ + s + “\”;”;
 Devs mess this up, but the code works anyway

 As a matter of principle, devs will do enough work to


make the code function
 If it works, it should work securely
 If it isn’t working securely, it shouldn’t be working at all
 The trick is to not make it easier to get around the security, than it is
to do things right
WHY CUSTOM INTERPOLATORS ARE
HARD: THE ANCIENT SCOPE WAR
 Lexical Scope: Scope Known At Compile Time
 Variables are “pushed” into child scopes
 Dynamic Scope: Scope Determined At Run Time
 Variables are “pulled” by child scopes
 Lexical scope has won, and has systematically removed
methods that allow any code to access variables not
explicitly pushed in
 This makes it rather difficult to write a function that sees
^^variable and thus deferences that variable
 There are silly “superclass” or “parent” modifiers in some
languages, but they’re all special case
 In Java and C#, they went so far as to leave local variables
unnamed on the stack, so you couldn’t just hop into previous stack
frames and dereference from there!
TO BE CLEAR
 Yes, there is risk to eval, and we’ll be talking about it
 Yes, there are very nice and very good reasons for lexical scope
to be the default state
 The fact that the vast majority of programming languages,
type safe or not, are repeatedly found to expose injection
flaws is a direct sign that something is wrong
 Put simply, language design needs to be informed by the bloody
findings of pen testers
 It is informed by performance engineers
 It is informed by usability engineers

 Memory safety didn’t come from security engineers, it came from reliability

engineers
 I think we need a way to write functions that execute in present scope
YES, THIS MEANS
 (LISP) (WAS) (RIGHT)
 (((NOT ABOUT EVERYTHING)))
 (((THEY ( HAD A POINT ( HERE ))))
 Crazy theory
 JavaScript has been successful because it’s been able to
mutate to absorb almost any language construct
 “More dialects of JavaScript than Chinese”
RISKS
 There are three things that can go wrong with any defensive
technology
 It doesn’t work
 None of this mealy mouthed, “well, it depends on what your threat model
is”
 Either it does what it says, or it doesn’t!

 It doesn’t work in the field


 Security: It is too easy to screw up
 It has side effects
 Fails other first class engineering requirements (too slow, unstable, hard to
deploy, etc)
 I am looking for destructive analysis on these techniques,
and will accept criticism on any of the above fronts
 Here is what I know so far
THE HANDLERS APPEAR RELATIVELY
SOLID
 No known SQL Injection bypasses for Base64 into a b64d()
function
 Using a fast base64 decode – could be flaws here
 Could be databases that don’t type-lock return values

 No known flaws when putting arbitrary text into a


span.textContent field
 Well,except it doesn’t work in IE  Will port to its wonky DOM
 Most testing is in Firefox -- Could be problems in Chrome/Safari,
Opera, etc.
 No known flaws when creating arbitrary DOM elements and
populating them, rather than pushing HTML
 IE6is apparently slow at this
 Need to enumerate the full set of tags which are safe to put into
HTML
EVAL ADDS SOME RISK
 Don’t buy that a PHP server is safer if it isn’t running
eval
 Month of PHP Bugs = PHP not safe against any arbitrary
PHP, eval or not
 Eval in this context can make programmer errors more
severe
 Correct: eval(b(“select * from foo where x=‘^^x’”));
 Incorrect: eval(b(“select * from foo where x = ‘$x’;”));
 Before we had SQLi. Now we potentially have front end code
execution!
 This is why it’s now ^^foo instead of $!foo
MANAGING RISK OF EVAL
 b() can be smarter
 It can be aware of strings that break out of string-returner
 It can be aware of SQL grammar, to the point that in order to
write a right hand variable, it must be ^^’d
 Select * from foo where x=^^x and y=safe(1);
 Itcan even be self-auditing – in PHP, it can use
debug_backtrace() to find the line that called it, and validate
that that line doesn’t have an unsafe language deref
WHAT ONLY SORT OF WORKS
 “Requiring” Single Quotes
 In some languages, ‘$foo’ doesn’t interpolate, while “$foo”
does
 So, the thinking is, require eval(b(‘$foo’))
 This is a policy that cannot be enforced by present compilers
or languages (both ‘$foo’ and “$foo” turn into a string in the
parse tree)
 Could be enforced by a preprocessor
 At large shops, significant improvements in security are won by

blocking otherwise legal expressions as a coding policy


 Not convinced that smaller shops can/should absorb
PERFORMANCE
 Eval is slower than compiled code
 Translating strings could be a major pain point in some
languages
 Easy to cache the translation (because we retain the boundary,
accessing the normalized query form is trivial)
 Could potentially parameterize/accelerate more, because it’s
suddenly easy for the framework to autorecognize repeated
queries
 Base64 is fast
 Slight
bandwidth increase, but nothing compared to
URLEncoding
ANYTHING ELSE?
 I don’t know.
 Hope: There’s about two months till Black Hat. Lets
find out!
 This isn’t a recommendation yet
 Clearly what we are doing right now is not working
 Lets find out the best things we can do with the present
languages
 Lets find out what we’d want from future languages
 It’s time we got involved in the discussion of what software
looks like

You might also like