Copyright  ©  2013  Splunk  Inc.

 

Big Data at the Speed of Business
Isaac Mosquera
Director of Mobile, ShareThis

Clint Sharp

Principal Big Data Product Manager, Splunk

What We’ll Talk About
•  •  •  •  • 

Our quest for visibility Analyzing at scale Splunk and Big Data Where do you start? Q&A

About Splunk
Company  (NASDAQ:  SPLK)  
"  " 

Founded  2004,  first  so?ware  release  in  2006   HQ:  San  Francisco     Industry-­‐leading  machine  data  plaHorm   On-­‐premise,  in  the  cloud  and  SaaS     63  of  the  Fortune  100   Largest  license:  100  Terabytes  per  day  

Business  Model  /  Products  
"  " 

5,600+  Customers  
"  " 

#1  Big  Data  Innovator*  
*  Fast  Company's  Most  Innova1ve  Companies  Issue  (March  2013)  

About ShareThis and Socialize
"

ShareThis  makes  the  world  more  connected,   trusted  and  valuable  through  sharing   Powers  the  social  web,  touching  the  lives     of  95  percent  of  U.S.     Acquires  Socialize,  which  makes  mobile     and  social  more  engaging   Socialized  integrated  into  thousands  of     iOS  and  Android  Apps   Installed  on  80M+  devices  

" 

" 

" 

" 

Evaluating 20 Billion

Ad Impressions Monthly  

Little Bit About Real-Time Bidding
R T B
Ad  Impression   Ad  Click  

Ad  Request   Winning  Bidder's  Ad  

Ad  Request   Bid  Response  

    Socialize   Bidder      

All  this  needs  to  happen  in  less  than  100  milliseconds!  

So What Are Some of the Problems?
OperaTonal  
"   IngesYng  more  than  10,000  
queries  per  second   "   Which  bids  are  >  100ms   "   Quickly  finding  any  errors  within   the  system  

Decision  Making     (Bid  Algorithms)  
"   Campaign  spending   "   Campaign  efficiency   "   Dissect  data  by:  
–  apps   –  users   –  devices    

Analyzing Big Data Efficiently
1.   2.   3.   4.  

CollecYon  

Storage  

AnalyzaYon/   AggregaYon  

Retrieval  

Some Options
RDBMS   RDBMS   NoSQL   SQL  funcYons  like  count()  presents  problems  at  scale  
 

Write  operaYons  too  high  for  a  single  DB,  as  well     as  a  single  point  of  failure   Would  work  well  for  high  inserts  and  queries,     however  we  would  need  to  build  alerYng,  charYng     and  reporYng  dashboards   Easy  to  setup  and  query  using  Hive  however  we     would  have  to  setup  a  new  environments  and  learn     new  technology  

Hadoop  

Splunk Fits the Bill
OperaTonal   ReporTng   AdHoc     Queries   ApplicaTon   ReporTng   Scalability   Easily  idenYfy  problems  and  prevent  erroneous   spending.    When  an  alert  goes  off  we  hit  a  script     which  shuts  off  the  bidder.   Allows  us  to  find  pacerns  in  the  data  to  improve     our  bid  algorithms   Instantly  know  campaign  metrics  for  us  and     our  clients     Adding  new  RTB  Service  providers  means  billions  of   new  ad  requests.  Scaling  horizontally  is  key  

Analysis/Aggregation
index=ad_events displayed_ad | bin _time span=1m | stats count(meta.displayed_ad) as displays sum(price/1000) as dollars_spent avg(price) as avg_cpm_price by campaign_id _time | mysqloutput spec=ads-prod table=ads_analytics insert="campaign_id, stat_date, displays, dollars_spent, avg_cpm_price"
Indexer   Indexer   Indexer   Search   Head   RDBMS   (Generated  Reports)  

Using Splunk to Analyze Operational Data
InteracYve  analysis  with  Search  Processing  Language:  
source="nginx-prod.log" | stats avg(ResponseTime) as avg_rtime, p95(ResponseTime) as p95_rtime , stdev(ResponseTime) as stdev_rtime

Easily  digest  informaYon  through  charts    

Final Architecture
Socialize  Bidder  

Splunk  
Indexer   Indexer   Indexer   Memcache  

Cache  Cluster  
Memcache   Memcache  

S3     Snapshots  

Search Head

    RDBMS     (Generated   Reports)  

So, What is Splunk?

14  

Expanding Universe of Data Sources
2012-12-05 07:04:44 Id=00Q000000Rd910EAJ City=New York Country=US CreatedDate=“2012-12-05 07:06:44” Email.jdoe@gmail.com Email_Opt_In_c Customer_Street _Address_c=“123 Main St.” purchased_product_id= product_i BD-01 twitter_username john_t_doe

Business  ApplicaTon  Data  
Highly  Structured  

Machine-­‐generated  Data  

Human-­‐generated  Data  
Arbitrarily  Structured  

Industry Leading Platform for Machine Data
Any Machine Data Operational Intelligence

Ad  hoc     search  

Monitor     and  alert  

Report  and     Custom     analyze   dashboards  

Developer   Pla^orm  

HA  Indexes   and  Storage  

Commodity   Servers  

Analyzing Heterogeneous Data
Universal  Index   Schema-­‐on-­‐the-­‐fly   Flexibility  and     Fast  Time  to  Value  
•  NormalizaYon  as  it’s   needed   •  Faster  implementaYon   •  Easy  search  language   •  MulYple  views  into  the   same  data  

•  No  data  normalizaYon   •  AutomaYcally  handles   Ymestamps   •  Parsers  not  required   •  Index  every  term  &   pacern  “blindly”   •  No  acempt  to   “understand”  up  front  

•  Structure  applied  at   search-­‐Yme   •  No  bricle  schema  to   work  around   •  AutomaYcally  find   transacYons,  pacerns   and  trends  

Gain Critical Insights … in Real-time
Sources  
Order  Processing  

Customer  ID  

Order  ID  

Product  ID  

Order  ID  
Middleware     Error  

Customer  ID  

Time  WaiYng  On  Hold  
Care  IVR  

Customer  ID   Twicer  ID   Customer’s  Tweet    

Twieer  

Company’s  Name  

Deep Visibility and Insight for IT and Business
IT  OperaYons  Management   ApplicaYon  Management   Security  and  Compliance   Web  Intelligence   Business  AnalyYcs   Industrial  Data  /  Internet  of  Things  

Over 5,600 organizations using Splunk across IT and business users

from Big Data  

Driving Insights

The ShareThis Insights Platform
On  Father’s  day:   “Who  were  the  most  shared  about  topics?”   ?   “What  type  of  type  of  beers  do  people  drink?”    

Hadoop   API   ETL  
Pre-­‐ aggregaTon   AnalyTcs  

Finding the Optimal Approach
What  should  be  the  core  focus  or  competency  of  your  team?  
" 

Hadoop  and  MapReduce  are  great  for  complex  data  science  on   data  at  rest  –  the  previous  architecture  took  9  months  with  a  team   of  engineers,  data  architects,  etc.   The  Splunk  plaHorm  delivers  real-­‐Yme,  interacYve  analysis  –     we  can  build  many  of  the  same  insights  within  1  hour   Conclusion:  find  the  most  opYmal  approach  for  the  business  

" 

What About
Ad Hoc Analysis?  

PR Insights Example
"  "  "  " 

What  was  the  situaTon?  (e.g.  fast  moving  business,  needed     real-­‐Yme  insights)   What  was  the  PR  team  struggling  with?  Difficult  to  find  useful     data  to  build  interesYng  use-­‐cases   What  did  they  want?    They  wanted  a  flexible  real-­‐Yme  reporYng   environment  to  extract  insights  useful  for  the  market   How  my  team  helped?  Delivered  a  single  dashboard  that  contained   real-­‐Yme  data  into  the  sharing  behaviors  across  our  network  

PR Insights Dashboard

Let’s not forget
The low-hanging fruit  

Operational Analytics for an Online World
Driving  Superior  Customer  Experience  

?  

How  many  500  errors   have  I  had  over  Yme?  

!  

Look  for  anomalies     and  spikes!  

!  

Zone  in  directly     to  the  customer!  

Online  Device  NoYficaYons  

NoTficaTons  Systems  
website  
API   NoYficaYon   Apple  (APNS)   Feedback   Processor   Google  (GCM)  

One More Thing …

28  

Copyright  ©  2013  Splunk  Inc.  

Announcing  Hunk  Beta  
New  product  from  Splunk   delivers  interacTve  data   exploraTon,  analysis  and   visualizaTons  for  Hadoop  

Splunk  AnalyYcs  for  Hadoop  

Derive Actionable Insights from Raw Data
1

Point   Splunk  at   Hadoop   Cluster  

2
Explore   Analyze   Visualize   Dashboards   Share  

Immediately   start  exploring,   analyzing  and   visualizing  raw   data  in  Hadoop  

Hadoop   Storage  

Learn More

splunk.com/bigdata  
31  

Copyright  ©  2013  Splunk  Inc.  

Questions?