Big Data at the Speed of Business
Isaac Mosquera
Director of Mobile, ShareThis

Clint Sharp

Principal Big Data Product Manager, Splunk

What We’ll Talk About
Our quest for visibility Analyzing at scale Splunk and Big Data Where do you start? Q&A

"  " 

"  " 

"  " 

About ShareThis and Socialize

ShareThis  makes  the  world  more  connected,   trusted  and  valuable  through  sharing   Powers  the  social  web,  touching  the  lives     of  95  percent  of  U.S.     Acquires  Socialize,  which  makes  mobile     and  social  more  engaging   Socialized  integrated  into  thousands  of     iOS  and  Android  Apps   Installed  on  80M+  devices  





Evaluating 20 Billion

Ad Impressions Monthly  

Little Bit About Real-Time Bidding
Ad  Impression   Ad  Click  

Ad  Request   Winning  Bidder's  Ad  

Ad  Request   Bid  Response  

    Socialize   Bidder      

All  this  needs  to  happen  in  less  than  100  milliseconds!  

So What Are Some of the Problems?
"   IngesYng  more  than  10,000  
queries  per  second   "   Which  bids  are  >  100ms   "   Quickly  finding  any  errors  within   the  system  

Decision  Making     (Bid  Algorithms)  
"   Campaign  spending   "   Campaign  efficiency   "   Dissect  data  by:  
–  apps   –  users   –  devices    

Analyzing Big Data Efficiently
AnalyzaYon/   AggregaYon  


Some Options
RDBMS   RDBMS   NoSQL   SQL  funcYons  like  count()  presents  problems  at  scale  

Write  operaYons  too  high  for  a  single  DB,  as  well     as  a  single  point  of  failure   Would  work  well  for  high  inserts  and  queries,     however  we  would  need  to  build  alerYng,  charYng     and  reporYng  dashboards   Easy  to  setup  and  query  using  Hive  however  we     would  have  to  setup  a  new  environments  and  learn     new  technology  


Splunk Fits the Bill
OperaTonal   ReporTng   AdHoc     Queries   ApplicaTon   ReporTng   Scalability   Easily  idenYfy  problems  and  prevent  erroneous   spending.    When  an  alert  goes  off  we  hit  a  script     which  shuts  off  the  bidder.   Allows  us  to  find  pacerns  in  the  data  to  improve     our  bid  algorithms   Instantly  know  campaign  metrics  for  us  and     our  clients     Adding  new  RTB  Service  providers  means  billions  of   new  ad  requests.  Scaling  horizontally  is  key  

index=ad_events displayed_ad | bin _time span=1m | stats count(meta.displayed_ad) as displays sum(price/1000) as dollars_spent avg(price) as avg_cpm_price by campaign_id _time | mysqloutput spec=ads-prod table=ads_analytics insert="campaign_id, stat_date, displays, dollars_spent, avg_cpm_price"
Indexer   Indexer   Indexer   Search   Head   RDBMS   (Generated  Reports)  

Using Splunk to Analyze Operational Data
InteracYve  analysis  with  Search  Processing  Language:  
source="nginx-prod.log" | stats avg(ResponseTime) as avg_rtime, p95(ResponseTime) as p95_rtime , stdev(ResponseTime) as stdev_rtime

Easily  digest  informaYon  through  charts    

Final Architecture
Socialize  Bidder  

Indexer   Indexer   Indexer   Memcache  

Cache  Cluster  
Memcache   Memcache  

S3     Snapshots  

Search Head

    RDBMS     (Generated   Reports)  

from Big Data  

Driving Insights

The ShareThis Insights Platform
On  Father’s  day:   “Who  were  the  most  shared  about  topics?”   ?   “What  type  of  type  of  beers  do  people  drink?”    

Hadoop   API   ETL  
Pre-­‐ aggregaTon   AnalyTcs  

Finding the Optimal Approach
What  should  be  the  core  focus  or  competency  of  your  team?  

Hadoop  and  MapReduce  are  great  for  complex  data  science  on   data  at  rest  –  the  previous  architecture  took  9  months  with  a  team   of  engineers,  data  architects,  etc.   The  Splunk  plaHorm  delivers  real-­‐Yme,  interacYve  analysis  –     we  can  build  many  of  the  same  insights  within  1  hour   Conclusion:  find  the  most  opYmal  approach  for  the  business  


What About
Ad Hoc Analysis?  

PR Insights Example
"  "  "  " 

What  was  the  situaTon?  (e.g.  fast  moving  business,  needed     real-­‐Yme  insights)   What  was  the  PR  team  struggling  with?  Difficult  to  find  useful     data  to  build  interesYng  use-­‐cases   What  did  they  want?    They  wanted  a  flexible  real-­‐Yme  reporYng   environment  to  extract  insights  useful  for  the  market   How  my  team  helped?  Delivered  a  single  dashboard  that  contained   real-­‐Yme  data  into  the  sharing  behaviors  across  our  network  

PR Insights Dashboard

Let’s not forget
The low-hanging fruit  

Operational Analytics for an Online World
Driving  Superior  Customer  Experience  


How  many  500  errors   have  I  had  over  Yme?  


Look  for  anomalies     and  spikes!  


Zone  in  directly     to  the  customer!  

Online  Device  NoYficaYons  

NoTficaTons  Systems  
API   NoYficaYon   Apple  (APNS)   Feedback   Processor   Google  (GCM)  

One More Thing …


