This action might not be possible to undo. Are you sure you want to continue?
Big Data at the Speed of Business
Director of Mobile, ShareThis
Principal Big Data Product Manager, Splunk
What We’ll Talk About
• • • • •
Our quest for visibility Analyzing at scale Splunk and Big Data Where do you start? Q&A
Company (NASDAQ: SPLK)
Founded 2004, ﬁrst so?ware release in 2006 HQ: San Francisco Industry-‐leading machine data plaHorm On-‐premise, in the cloud and SaaS 63 of the Fortune 100 Largest license: 100 Terabytes per day
* Fast Company's Most Innova1ve Companies Issue (March 2013)
About ShareThis and Socialize
ShareThis makes the world more connected, trusted and valuable through sharing Powers the social web, touching the lives of 95 percent of U.S. Acquires Socialize, which makes mobile and social more engaging Socialized integrated into thousands of iOS and Android Apps Installed on 80M+ devices
Evaluating 20 Billion
Ad Impressions Monthly
Little Bit About Real-Time Bidding
R T B
Ad Impression Ad Click
Ad Request Winning Bidder's Ad
Ad Request Bid Response
All this needs to happen in less than 100 milliseconds!
So What Are Some of the Problems?
" IngesYng more than 10,000
queries per second " Which bids are > 100ms " Quickly ﬁnding any errors within the system
" Campaign spending " Campaign eﬃciency " Dissect data by:
– apps – users – devices
Analyzing Big Data Efficiently
1. 2. 3. 4.
RDBMS RDBMS NoSQL SQL funcYons like count() presents problems at scale
Write operaYons too high for a single DB, as well as a single point of failure Would work well for high inserts and queries, however we would need to build alerYng, charYng and reporYng dashboards Easy to setup and query using Hive however we would have to setup a new environments and learn new technology
Splunk Fits the Bill
OperaTonal ReporTng AdHoc Queries ApplicaTon ReporTng Scalability Easily idenYfy problems and prevent erroneous spending. When an alert goes oﬀ we hit a script which shuts oﬀ the bidder. Allows us to ﬁnd pacerns in the data to improve our bid algorithms Instantly know campaign metrics for us and our clients Adding new RTB Service providers means billions of new ad requests. Scaling horizontally is key
index=ad_events displayed_ad | bin _time span=1m | stats count(meta.displayed_ad) as displays sum(price/1000) as dollars_spent avg(price) as avg_cpm_price by campaign_id _time | mysqloutput spec=ads-prod table=ads_analytics insert="campaign_id, stat_date, displays, dollars_spent, avg_cpm_price"
Indexer Indexer Indexer Search Head RDBMS (Generated Reports)
Using Splunk to Analyze Operational Data
InteracYve analysis with Search Processing Language:
source="nginx-prod.log" | stats avg(ResponseTime) as avg_rtime, p95(ResponseTime) as p95_rtime , stdev(ResponseTime) as stdev_rtime
Easily digest informaYon through charts
Indexer Indexer Indexer Memcache
RDBMS (Generated Reports)
So, What is Splunk?
Expanding Universe of Data Sources
2012-12-05 07:04:44 Id=00Q000000Rd910EAJ City=New York Country=US CreatedDate=“2012-12-05 07:06:44” Email.email@example.com Email_Opt_In_c Customer_Street _Address_c=“123 Main St.” purchased_product_id= product_i BD-01 twitter_username john_t_doe
Industry Leading Platform for Machine Data
Any Machine Data Operational Intelligence
Ad hoc search
Monitor and alert
Report and Custom analyze dashboards
HA Indexes and Storage
Analyzing Heterogeneous Data
Universal Index Schema-‐on-‐the-‐ﬂy Flexibility and Fast Time to Value
• NormalizaYon as it’s needed • Faster implementaYon • Easy search language • MulYple views into the same data
• No data normalizaYon • AutomaYcally handles Ymestamps • Parsers not required • Index every term & pacern “blindly” • No acempt to “understand” up front
• Structure applied at search-‐Yme • No bricle schema to work around • AutomaYcally ﬁnd transacYons, pacerns and trends
Gain Critical Insights … in Real-time
Customer ID Twicer ID Customer’s Tweet
Deep Visibility and Insight for IT and Business
IT OperaYons Management ApplicaYon Management Security and Compliance Web Intelligence Business AnalyYcs Industrial Data / Internet of Things
Over 5,600 organizations using Splunk across IT and business users
from Big Data
The ShareThis Insights Platform
On Father’s day: “Who were the most shared about topics?” ? “What type of type of beers do people drink?”
Pre-‐ aggregaTon AnalyTcs
Finding the Optimal Approach
What should be the core focus or competency of your team?
Hadoop and MapReduce are great for complex data science on data at rest – the previous architecture took 9 months with a team of engineers, data architects, etc. The Splunk plaHorm delivers real-‐Yme, interacYve analysis – we can build many of the same insights within 1 hour Conclusion: ﬁnd the most opYmal approach for the business
Ad Hoc Analysis?
PR Insights Example
" " " "
What was the situaTon? (e.g. fast moving business, needed real-‐Yme insights) What was the PR team struggling with? Diﬃcult to ﬁnd useful data to build interesYng use-‐cases What did they want? They wanted a ﬂexible real-‐Yme reporYng environment to extract insights useful for the market How my team helped? Delivered a single dashboard that contained real-‐Yme data into the sharing behaviors across our network
PR Insights Dashboard
Let’s not forget
The low-hanging fruit
Operational Analytics for an Online World
Driving Superior Customer Experience
How many 500 errors have I had over Yme?
Look for anomalies and spikes!
Zone in directly to the customer!
Online Device NoYﬁcaYons
API NoYﬁcaYon Apple (APNS) Feedback Processor Google (GCM)
One More Thing …
Copyright © 2013 Splunk Inc.
New product from Splunk delivers interacTve data exploraTon, analysis and visualizaTons for Hadoop
Splunk AnalyYcs for Hadoop
Derive Actionable Insights from Raw Data
Point Splunk at Hadoop Cluster
Explore Analyze Visualize Dashboards Share
Immediately start exploring, analyzing and visualizing raw data in Hadoop
Copyright © 2013 Splunk Inc.