You are on page 1of 21

BIG DATA APPROACHES TO CLOUD SECURITY

Paul Morse President, WebMall Ventures Cloud Security Alliance, Seattle Chapter 3/28/2013

BIG DATA IS NOT JUST ABOUT LOTS OF DATA, IT IS ABOUT HAVING THE ABILITY TO EXTRACT MEANING; TO SORT THROUGH THE MASSES OF DATA ELEMENTS TO DISCOVER THE HIDDEN PATTERN, THE UNEXPECTED CORRELATION,
Art Coviello, executive chairman of RSA

ON THE SURFACE, BIG DATA SEEMS TO BE ALL ABOUT BUSINESS INTELLIGENCE AND ANALYTICS, BUT IT ALSO AFFECTS THE NITTYGRITTY OF POWER AND COOLING, NETWORKING, STORAGE AND DATA CENTER EXPANSION.

AGENDA
Observations Cloud Architectures/Components Machine-Generated Data
Sources of Data

Time Sequencing of Events Searching for Behavior Recent Hack Examples

OBSERVATIONS
Big Data solutions are changing the game for security practitioners and execs Provide the ability to look at discovery, detection and remediation across large portions of the organization in entirely new ways Correlation between seemingly unrelated events in near real time is now relatively easy Growing range of solution types simple to highly complex
Roll your own to pre-packaged solutions On-prem, Public Cloud-based and Hybrid Simple Log search to Predictive Analysis with complex dashboards and reporting

Some solutions have extremely short time to value propositions Big Data Washing like Cloud Washing is showing up Prices vary Free to mondo It is NOT the holy grail for security but has many advantages over traditional SIEM products real time, large amounts of data, broad event correlation, etc.

SET THE STAGE


Many perspectives to Cloud Computing

Main focus for this talk is as a Public Cloud Provider


You are the owner of the facility all of it. Infrastructure-centric discussion

How do Big Data solutions improve Security?

YOUR CLOUD DATACENTER

Backup Generators Backup Batteries Power Distribution

SCADA Door Sensors Card Key Systems

DATA SOURCES
Wireless Devices PCs Printers Tablets Phones? Temp Sensors

RFID Storage

This is your attack surface

Servers
Routers/Switches

Water System

Lighting controls

I want all the data in one searchable repository and available in near real time

SECURE? THINK AGAIN.


Internet Mapping Project

harmless Port ping and bot install


660 million IPs with 71 billion ports tested 460 Million Devices Responded Resulted in 420 thousand bots Stupid uid/pwd combos
Admin/admin, Admin/no pwd, root/root, root/no pwd

Whats on your network?


http://internetcensus2012.bitbucket.org/paper.html

CAUSE FOR PAUSE


We hope other researchers will find the data we have collected useful and that this publication will help raise some awareness that, while everybody is talking about high class exploits and cyberwar, four simple stupid default telnet passwords can give you access to hundreds of thousands of consumer as well as tens of thousands of industrial devices all over the world.

MACHINE DATA
Isnt it really all machine data? Machine-generated data (MGD) is the generic term for information which was automatically created from a computer process, application, or other machine without the intervention of a human. Network Device Log files Event logs Application logs RFID logs Storage logs HVAC Logs Sensor data Etc.

MACHINE DATA EXAMPLES


Apache
[Fri Sep 09 10:42:29.902022 2011] [core:error] [pid 35708:tid 4328636416] [client 72.15.99.187] File does not exist: /usr/local/apache2/htdocs/favicon.ico

Juniper
Sep 10 07:06:45 host rpd[6451]: bgp_listen_accept: Connection attempt from unconfigured neighbor: 10.0.8.1+1350 Sep 10 07:07:53 host login: 2 LOGIN FAILURES FROM 172.24.16.21 Sep 10 07:08:25 host inetd[2785]: /usr/libexec/telnetd[7251]: exit status 0x100 Oracle/Siebel
SQLParseAndExecute Statement 4 0 2003-05-13 14:07:38 select ROW_ID, NEXT_SESSION, MODIFICATION_NUM from dbo.S_SSA_ID

IIS 192.168.114.201, -, 03/20/01, 7:55:20, W3SVC2, SALES1, 172.21.13.45, 4502, 163, 3223, 200, 0, GET, /DeptLogo.gif, -, 172.16.255.255, anonymous, 03/20/01, 23:58:11, MSFTPSVC, SALES1, 172.16.255.255, 60, 275, 0, 0, 0, PASS, /Intro.htm, -,

Card Reader 10/23/04 06:16:32,Administrator,00000101,Anderman,Penny,00026,01000,10/22/2005 10/23/04 06:16:32,West Gate,00000100,Peterson,Bob,00954,01000,10/21/2005

TIME SEQUENCE OF EVENTS


Outbound Traffic Terminate Sess Delete logs Installer runs Upload Small File Command Fail Pass Login Attempt Server TOR LB Front end IP Address/Packet

T0 T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 T12 T13 T14 T15 T16 T17 T18 T19

TIME SEQUENCE OF EVENTS


Terminate Sess Delete logs Update Upload Small File Command Fail Pass Login Attempt Device TOR LB Front end IP Address/Packet

T0 T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 T12 T13 T14 T15 T16 T17 T18

TIME SEQUENCE OF EVENTS


Terminate Sess Delete logs Update Upload Small File Command Fail Pass Login Attempt Device IP Address/Packet

T0 T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 T12 T13 T14 T15 T16 T17 T18

Door 5 Door 4 Door 3 Door 2 Door 1 T-30 T-15 T0 T15 T30 T45

SOME AREAS TO CONSIDER


Ingesting various data formats
Many vendors claim it is easy, when it may not be Transforms and connectors may be required (affect performance) Device companies create add-ons, connectors, dashboards, transforms, queries, etc Speed of indexing determines real time abilities Do you need to index ALL machine data?

Vendor-specific Query languages


No standard, some commonality Learning curve for seriously complex queries and operationalizing environment

Dashboards and Visualizations Vary Large number of simultaneous queries is required Workflow is critical what happens when you find something? Implementation architecture lots of hardware? Bandwidth? Security? Users? Data Governance You found what?

HACK EXAMPLES
DOJ in January
Defacement What specific behavior happened and what did they do?
Log in Remotely Completely replace Index.*

Solution monitor index.* and set up a parsing stream and search for a code in the html. Call a workflow if the file changes or the code doesnt match.

DDoS
Overwhelm Website Solution compare request rate of increase to a previous norm. If the disparity is great enough, call a workflow to check IP addresses of source(s). Depending on results, do nothing or script a filter or block.

VENDORS AND GETTING STARTED


Hadoop with Flume HP ArcSight Loggly Logrythm SumoLogic LogScape LogStash Sawmill Splunk Splunk Storm Getting Started Easiest Cloud Based
Sumo Logic Splunk Storm

Download and Install


Loggly Logrythm LogScape LogStash Sawmill Splunk Hadoop/Flume/Pig