WHITE PAPER

Prognosis - An Approach to Predictive Analytics

Abstract
Prediction is a statement made about the future, an anticipatory vision or perception. This White Paper discusses the emergence of technology that enables precise predictions in varied fields, and the application of exploratory and normative methods to augment decision making. Forecasting is primarily based on mining historical data sets, extracting hidden patterns and transforming them into valuable information through a process of classification, clustering, regression and association rule learning. The white paper talks about Impetus’ implementation of Behavioral Targeting for the ad world. This is a widely accepted, statistical machine learning algorithm that helps select most relevant ads to be displayed to a web user based on their historical data.
.

Impetus Technologies Inc. www.impetus.com November 2011

Prognosis – An Approach to Predictive Analytics

Table of Contents
Introduction .................................................................................................................................................. 2 Large scale data analytics ......................................................................................................................... 3 Algorithms for forecasting & prediction ................................................................................................... 3 Behavioral Targeting ..................................................................................................................................... 4 Advantages and threats ............................................................................................................................ 4 Industry impact ......................................................................................................................................... 5 Generic Approach to BT problem solving ................................................................................................. 6 Large scale implementation of BT ................................................................................................................ 6 Poisson’s Linear Regression ...................................................................................................................... 6 Implementing BT using Poisson’s Linear Regression ................................................................................ 6 1. 2. 3. Data Preparation ........................................................................................................................... 7 Model Training .............................................................................................................................. 8 Model Evaluation ........................................................................................................................ 12

Summary ..................................................................................................................................................... 14

Introduction
A prediction is a statement about the way things will happen in the future, often but not always based on experience or knowledge. Prediction is necessary to allow plans to be made about possible developments. Large corporations invest heavily in this kind of activity to help focus attention on possible events, risks and business opportunities. Such work brings together all available past and current data, as a basis to develop reasonable expectations about the future. The basic idea behind any such algorithm is to gather gigantic behavioral data that describes the historical series of events/actions/behavior of the entity in question. This data is fed into machines and run through complex machine learning algorithms to derive models. The models serve as the basis for predictions, i.e. based on input criteria the models infer the expected behavior of the entity. The application of prediction algorithms has gained prominence in a wide range of fields such as finance (stock market predictions), insurance (predicting life expectancy), science (weather forecasting, predicting natural disasters), medical

2

Prognosis – An Approach to Predictive Analytics

science (treating developmental disabilities), marketing (behavioral targeting) and many more. Typically, with predictions, there is a huge amount of historical data, time is of the essence and there is always a current activity happening that impacts the future. In many cases, freshness of data is a key factor and plays a major role in forecasting the future course of action. In other instances, the entire data set has equal relevance and contributes to determining the future.

Large scale data analytics
Projects related to future predictions and forecasting point to a huge increase in the amount of data that must not only be stored but processed quickly and efficiently. These challenges are at once a daunting and exciting chance to use data to create a positive impact. Often, there is an immediate need to analyze the data at hand, to discover patterns, reveal threats, monitor critical systems, and make decisions about the direction the organization should take. Several constraints are always present: the need to implement new analytics quickly enough to capitalize on new data sources, limits on the scope of development efforts, and the pressure to expand mission capability without an increase in budgets. For many of these applications, the large data processing stack (which includes the simplified programming model Map-Reduce, distributed file systems, semi-structured stores, and integration components, all running on commodity class hardware), has opened up a new avenue for scaling out efforts and enabling analytics that were impossible in previous architectures. This new ecosystem has been found to be remarkably versatile at handling various types of data and classes of analytics. Perhaps the most exciting benefit, however, from moving to these highly scalable architectures is that after the immediate issues have been solved, often with a system that can handle today’s requirements and scale up to 10x or more, new analytics and capabilities can be developed, evaluated and integrated easily. This is owing to the speed and ease of Map-Reduce, Pig, Hive, and other technologies. More than ever, the large-scale data analysis software stack is proving to be a platform for innovation.

Algorithms for forecasting and prediction
There are several classes of statistical algorithms that are well suited for these kinds of problems, which are associated with trend analysis, pattern generation and artificial intelligence based predictions. Some of the most common ones are: • Conjoint Analysis – Expert opinion and Delphi surveys

3

Prognosis – An Approach to Predictive Analytics

• • •

Quantitative – Statistical, suited to predicting trends e.g. Poisson’s Linear regression, Exponential smoothing Qualitative – Subjective, providing a range of possible outcomes, e.g. the Bayesian approach Statistical combination – A mix of quantitative and qualitative techniques e.g. Quasi Bayes

Behavioral Targeting
Behavioral targeting (BT) leverages historical user behavior to select the most relevant ads to display. The state-of-the-art of BT derives a Linear Poisson Regression model from fine-grained user behavioral data and predicts clickthrough rate (CTR) from user history. Behavioral targeting is an application of modern statistical machine learning methods to online advertising. But unlike other computational advertising techniques, BT does not primarily rely on contextual information such as query (‘sponsored search’) and web page (‘content match’). Instead, BT learns from past user behavior, especially the implicit feedback (i.e., ad clicks) to match the best ads to users. This makes BT enjoy a broader applicability such as graphical display ads, or at least a valuable user dimension complementary to other contextual advertising techniques. In today's practice, behaviorally targeted advertising inventory comes in the form of some kind of demand-driven taxonomy. Hierarchical examples are Finance, Investment and Technology, Consumer Electronics, and Cellular Telephones. Within a category of interest, a BT model derives a relevance score for each user from past activity. Should the user appear online during a targeting time window, the ad serving system will qualify this user (to be shown an ad in this category) if the score is above a certain threshold. One de facto measure of relevance is CTR, and the threshold is predetermined in such a way that both a desired level of relevance (measured by the cumulative CTR of a collection of targeted users) and the volume of targeted ad impressions (also called reach) can be achieved. The impact of behavioral targeting can be negative if consumers feel annoyed or threatened by the use of their ‘personal’ data. However, as demonstrated by Amazon, when personal information and technology enhance the online experience, there is less risk of a negative response.

Advantages and threats
There are a lot of advantages attributed to ad targeting and behavioral analysis, but at the same time it is also important to look at the downsides and surface

4

Prognosis – An Approach to Predictive Analytics

the threats posed by them. Some of the advantages that can be seen right away are: • • • • • Reaching the right audience at the right time (of the day, week or life stage), with clear behavioral assumptions Standing out in a cluttered category Reaching target audiences when ‘context’ inventory is sold out (reaching same target in alternative content) High cost of entry in desired content (reaching the same target in alternative content with lower costs) Tailoring message to behavioral patterns to make it more relevant

As mentioned earlier, there are some downsides to BT: • Achieving high reach is difficult. Within extremely targeted segments, the potential universe available may be very limited and there may be a limit to the sites currently allowing behavioral targeting. Inconsistencies within segment classifications. The definition of ‘common’ behavioral segment may differ by publisher (e.g., job seeker searching Monster.com not the same job seeker as reading job-related article on iVillage). Also, as the technology is cookie enabled, it suffers the usual issues of cookie stability and data accuracy. Ultimate issue of behavioral targeting clutter. Other advertisers within the same vertical will compete in the same space/segments. This is currently a future issue but in time, cost, clutter and inventory availability positives will become challenges (as seen in paid search). In the future, as targeting matures and advertisers have measurable results, historical data will be a key indicator of which assumptions work. This will provide optimization insights. Collecting and analyzing response data generated from different segments are important prerequisites for success.

Industry impact
Behavioral targeting, as a concept, has wide acceptance in the industry. Indicated below are some use-cases where it is being successfully implemented as a tool for predicting user behavior: • • • • • Ad Targeting and Predicting the buying behavior of users Relationship building Audience targeting Presidential candidates using BT to target persuasion Treatment of mental disorders and developmental disabilities

5

Prognosis – An Approach to Predictive Analytics

There is a vast horizon where BT, or BT based solutions are being used to successfully predict/forecast behavior in order to increase reach, accessibility, and revenue.

Generic approach to BT problem solving
• Data mining involves extracting hidden patterns from data to transform it into valuable information using computer power to apply knowledge discovery methodologies. It applies knowledge discovery and prediction through a process of classification, clustering, regression and association rule learning. The value of the information depends on the collection of indicative and representative data. Cookies for behavioral advertising usually contain text that uniquely identifies the browser so that advertisers or ad networks can recognize the same Internet user across different Web sites or multiple areas on the same site.

• • •

Large scale implementation of BT
Poisson’s Linear Regression
This is a statistical method used to calculate the probability of an event, given the rate of occurrence of the event in disjoint timeframes, suited for analyzing outcomes that have positive values. Poisson’s Linear Regression works really well where the input data is sparse i.e. results are valid for rare events. It can model rare events when everyone is followed for the same length of time, or when people have different length of follow ups.

Implementing BT using Poisson’s Linear Regression
Behavioral targeting can be effectively implemented using the Poisson’s Linear Regression algorithm, as it maps well to the nature of input data and the kind of predictions that organizations are looking at. The Algorithm is well explained by the flow chart:

6

Prognosis – An Approach to Predictive Analytics

Impetus Technologies implemented Behavioral targeting using the Poisson’s Linear Regression algorithm. The algorithm was deployed using the Hadoop ecosystem. The entire algorithm was decomposed into individual steps. Each of the steps was implemented as a Hadoop M/R job and the jobs were run sequentially using the Oozie workflow engine. The results of the implementation were models for different categories. These models were stored on the HBase data store and later consumed for analytics and behavioral predictions.The steps involved in the above implementation are explained below: 1. Data Preparation In this preprocessing step, the data fields of interest were extracted from raw data feeds, thus reducing the size of the data. Raw data was related to user behavior with respect to one or more ads. It also included ad clicks, ad views, page views, searches, organic clicks or overture clicks.
7

Prognosis – An Approach to Predictive Analytics

1. The raw data came from the user base 2. The system stored the raw data in HDFS 3. The raw data was sent to the data preparation module which undertook the following: a. Aggregated event counts over a configurable period of time, to further shrink the data size b. Merged counts into a single entry with <cookie, timeperiod> as unique key c. It included two M/R jobs–Feature-Extractor and FeatureGenerator 1.1 Feature-Extractor Input - Raw data feeds Output - <cookie:time-period:feature-Type:feature-Name, featureCount> 1.2 Feature-Generator Input - <cookie: time-period: feature-Type: feature-Name, featureCount> Output - <cookie: time-period, feature-Type: feature-Name, featureCount ...> 2. Model Training This fitted the Linear Poisson Regression Model from the preprocessed data and involved the following:
1. 2. 3.

Feature selection Generating of training examples Model weights initialization

4. Multiplicative recurrence to converge model weights

2.1 Poisson-Entity-Dictionary It mainly performed feature selection and inverted indexing. It did this by counting entity frequency in terms of touching cookies and selecting the most frequent entities in the given feature space. Output-Hashmap of <entityType:featureName, featureIndex>(inverted index) for all entity types An entity referred to the name (unique identifier) of an event (e.g. an ad id, a space Id for page, or a query). The Entity was different from the

8

Prognosis – An Approach to Predictive Analytics

feature since the latter was uniquely identified by the <featureType, featureName> pair. In the context of BT, there were three types of entities—ad, page and search The Poisson entity dictionary included three M/R jobs— PoissonEntityUnit, PoissonEntitySum, and PoissonEntityHash 2.2 Poisson-Feature-Vector This generated training examples (feature vectors) that were directly used later by model initialization and multiplicative recurrence. It used a sparse data structure (populated primarily with zeros) for feature vectors. Behavioral count data is very sparse by nature. For a given user, in a given time period, his or her activity only involves a limited number of events. Impetus used a pair of arrays of the same length to represent a feature vector or a target vector—an Integer type for feature and float type for value (float type for possible decaying), with an array index giving a <feature, value> pair. Feature Selection and inverted indexing: - With the feature space selected from PoissonEntityDictionary, in this step, Impetus discarded the unselected events from the training data in the feature (input variable) side. On the target (response variable) side Impetus took the option of using all features or only selected features to categorize them into target event counts. With the inverted index built from PoissonEntityDictionary, from the PoissonFeatureVector step and onwards, Impetus referenced an original feature name by its index. The same idea was also applied to cookies, since the cookie field was irrelevant. Several pre-computations were performed at this stage: 1. Impetus further aggregated feature counts into a time window, with a size larger than or equal to the resolution from data preparation. 2. Decay counts over time using a configurable factor 3. Realized causal approach to generate examples. (Causal approach collects features before targets temporarily; while the non-causal approach generates targets and features from the same period of history).

9

Prognosis – An Approach to Predictive Analytics

4. Impetus used binary representation (serialized objects in java) and data compression (Sequence file with BLOCK compression in Hadoop framework) for feature vectors. Data structure for the feature vector int[targetLength] targetIndex Array float[targetLength] targetValue Array int[inputLength] inputIndex Array float[inputLength] inputValue Array

Input - <cookie:timeperiod, featureType:featureName:featureCount ...> Output - <cookieIndex, featureVector>

Target counts were collected from a sliding time window and feature counts aggregated (possibly with decay) from a time period preceding the target window. The size of the sliding window was kept relatively small for the following reasons: 1. A large window effectively discarded many <features, targets>co-occurrences within that window. E.g. The following setup yielded superior long term models: a. A target window of size one day b. Sliding over a one week period c. Preceded by a four week feature window(also sliding along with the target window)

The Algorithm included the following: 1. For each cookie Impetus cached all the event count data. 2. It sorted events by time, forming an event stream of this particular cookie covering the entire time period of interest. 3. Impetus pre-computed boundaries of the sliding window. Four boundaries were specified — featureBegin, featureEnd, targetBegin, targetEnd. separatingfeatureEnd and targetBegin allowed a gap window in between, which was necessary to emulate possible latency in online prediction.

10

Prognosis – An Approach to Predictive Analytics

4. The company maintained three iterators on the event stream, referencing previous featureBegin, current FeatureBegin, and targetBegin. It used one pair of treeMap objects (i.e. inputMap and targetMap) to hold features and targets of a feature vector as the data was being processed. 2.3 Poisson-Initializer It initialized the model weights (coefficients of the regressor’s) by scanning the training data once. k: Index of target variables j: Index of features or input variables i: examples a unigram(j) is one occurrence of feature j a bigram(k,j) is one co-occurrence of target k and feature j The basic idea was to allocate the weight w(k,j) as a normalized number of co-occurrences of (k,j).Bigram based initialization. The output of PoissonInitializer was an initialized weight matrix of dimensionality number of targets by number of features. 1. Impetus distributed the computation of counting the bigrams by a composite key<k,j> and effectively pre-computed total bigram counts of all examples before the final stage. 2. The M/R framework provided a single key data structure. In order to distribute <k,j>, Impetus needed an efficient function to transform a composite key(two integers) into a single key and recover the composite key back when needed. bigram Key(k,j) = a long integer obtained by bitwise left shift 32 bit of k and then bitwise OR by j 3. The Impetus team cached the output of first mapper that emitted <bigramKey, bigramCount>. 2.4 Poisson-Multiplicative It updated the model weights by scanning the training data iteratively. It utilized highly effective multiplicative recurrence. Computing a normalizer Poisson mean involved dot product a previous weight vector by a feature vector (The input portion) Input - <cookieIndex, featurevector> Output - updated wk for all k

11

Prognosis – An Approach to Predictive Analytics

1. Impetus represented the model weight matrix as K dense weight vectors (arrays) of length J, where K was the number of targets and J the number of features. 2. Using weight vectors was more scalable in terms of memory footprint than matrix representation. But, it raised challenges in Disk IO. Impetus addressed this problem via in-memory caching. Caching weight vectors was not the solution. The trick was to cache input examples. After caching, Impetus maintained a hashmap that recorded all relevant targets for cached feature vectors. And provided constant time lookup from target Index to array-index Map<targetIndex, arrayIndex>. 3. Impetus also used Hadoop's distributed cache, which copied the requested files from HDFS to the slave nodes before the task was executed. It only copied the files once per job for each task tracker, which was shared by M/R tasks. 3. Model Evaluation It tested the trained model on a test data set. The main tasks were: 1. Predicting expected target counts(clicks and views) 2. Scoring (CTR) 3. Ranking scores of a test set 4. Calculating and reporting performance metrics such as CTR lift and area under ROC curve. This component contained three sequential steps: 3.1 Poisson-Feature-Vector-Eval It was Identical to Poisson-Feature-Vector. • • There was no need to book keep the summary statistics for training such as total count of examples, feature and target unigrams. Decay was typically necessary in generating test data. Since it enabled efficient incremental predicting as new events flow in, while diminishing the obsolete long history exponentially. Sampling and heuristic based robot filtering were not applied to generate test data Impetus could remove those examples without a target from the test dataset, since these records did not impact the performance, no matter how the model predicted them. However, examples with targets were also kept, even those without any inputs. This was because these records

• •

12

Prognosis – An Approach to Predictive Analytics

• •

(‘new users’) had to be scored by the model in production and hence had a non-trivial impact on the performance. Impetus categorized target counts either from the entire feature space or from the selected space depending on the learning goal. The size of the sliding window was configured approximately the same as the ad serving cycle in production and the size of the gap window imitated the latency between last seen events and the next ad serving in production.

3.2 Poisson-Predictor Input - <cookieIndex, FeatureVector> Output - <cookieIndex, predictedActualtarget[2 x numTarget]> It took the dot product of a weight vector and a feature vector as the predicted target count (a continuous variable). To predict the expected number of ad clicks and views in all categories for an example I, the algorithm needed to read the weight vectors of all targets converged from Poisson-Multiplicative. 3.3 Poisson-Evaluator Input - <cookieIndex, predictedActualtarget[2xnumTarget]> Output - performance metrics, per category and overall reports It scored each testing example by dividing its predicted clicks by predicted views and applying Laplacian smoothing. It then sorted all examples by score and finally computed and reported the performance metrics. The performance metrics include: • • • • • The number of winning categories over certain benchmarks Cumulative CTR CTR lift Area under ROC curve Summary stats

It generated reports of both in accordance with category results and overall performance.

13

Prognosis – An Approach to Predictive Analytics

Summary
As explained above, prediction is a statement made about the future. A very popular area of application that has flourished in recent times is Behavioral targeting (BT). BT is defined as a large scale machine learning problem that leverages historical user behavior to select the most relevant ads to display. The process basically involves mining historical data sets and extracting hidden patterns (trends) to predict user interests. Major IT giants like Yahoo, Google and Amazon have used Behavioral Targeting and achieved major gains in terms of reach and CTR increase. There are several implementations of BT that employ various statistical algorithms and processes to extract the behavioral traits of the users in question. The input to the BT engine is a historical sequence of the activities undertaken by users over the Internet. These activities include ad clicks, ad views, page views, search queries and search clicks. As the users browse the Internet they unknowingly leave a trail of footprints in terms of visited pages, ads, cookies, etc. These footprints reveal a lot about their personality traits. BT leverages on these subtle inputs and without hindering the privacy of the users draws their personality sketch. Based on these inferences, advertisers are able to target their audience and show them relevant ads. Impetus applied Poisson’s Linear Regression algorithm for its implementation. This was deployed on the Hadoop environment using chained Map reduce jobs as an Oozie workflow.

About Impetus Impetus Technologies offers Product Engineering and Technology R&D services for software product development. With ongoing investments in research and application of emerging technology areas, innovative business models, and an agile approach, we partner with our client base comprising large scale ISVs and technology innovators to deliver cutting-edge software products. Our expertise spans the domains of Big Data, SaaS, Cloud Computing, Mobility Solutions, Test Engineering, Performance Engineering, and Social Media among others. Impetus Technologies, Inc. 5300 Stevens Creek Boulevard, Suite 450, San Jose, CA 95129, USA Tel: 408.252.7111 | Email: inquiry@impetus.com Regional Development Centers - INDIA: • New Delhi • Bangalore • Indore • Hyderabad To know more visit: http://www.impetus.com

Disclaimers
The information contained in this document is the proprietary and exclusive property of Impetus Technologies Inc. except as otherwise indicated. No part of this document, in whole or in part, may be reproduced, stored, transmitted, or used for design purposes without the prior written permission of Impetus Technologies Inc.

14

Sign up to vote on this title
UsefulNot useful