You are on page 1of 16

DMA Analytics Community

Application of Supervised Machine Learning


techniques on Semi-Structured Social Media
data for Classification

1/19/17
Analytics Community
» Learn and network – attend the 2017 Marketing Analytics Conference and a Regional
Roundtable

» Acquire new knowledge – check-out the 2016 Analytics Journal today.

» Increase your visibility – sponsorship opportunities for every company and budget
and advertising space in the Analytics Journal.

» Get published – submit an article as part of the Analytics Advantage Blog Series.

» Establish yourself—present a relevant topic or case study to a specific


Community. Submit your ideas to Laura Gigliotti for consideration!

» Build your resume – volunteer on our leadership council and assist in content
planning and promotion.

» Contact – Laura Gigliotti at lgigliotti@thedma.org or 212.790.1536 to learn more


and get involved!

» Website – thedma.org/acc

2
Presenter
Damon Samuel, Director of Data Science at
RCG Global Services brings nearly 20 years of
analytical experience to bear. Samuel has
built models for numerous industries including
Insurance, Automotive, Retail, Credit,
Pharmaceuticals, Telecom, Staffing, and
Utilities. These models have touched IT,
Finance, Marketing, Real Estate, and more.
Samuel has been recognized by the
Advertising Research Foundation as a top
researcher and was a board member for the
Marketing Science Institute in 2015, as well as
acknowledged in “Profiting from the Data
Economy” by David Schweidel.

3
Topics
Title in English

Refresher on Machine Learning Techniques

Structure of Data

Sources of Data

Teaching the computer to read

A few tools that can help

Why bother anyway

Rehash
Title in English
Application of Supervised
Machine Learning techniques
on Semi-Structured Social
Media data for Classification

Using algorithms to group


social media posts
Refresher on Machine Learning
Supervised Unsupervised
History
Regression techniques Beyond today’s scope
Neural Nets
Statistics Naïve Bayes
Predictive Decision Trees (incl RF)
Genetic Algorithms
Data Support Vector Machines
Mining
All of the supervised Clustering
Analytics predictive plus Principal Components
Discriminant Analysis Graph Analytics
Classification
Term Document Matrix
Machine
Learning
Structure of Data

Flavors
Structured

Semi-
Structured

Unstructured
Sources of Data
Internal
Sources

Comments

Amazon
Reviews

Likes

emails
Teaching the computer to read
Is this English

Hashtags

At tags

Emojis

Typos

Codes

Abbreviations

Lol speak

Links
Teaching the computer to read
Exploration
and cleaning

Regex

Stop words

Stems

N-grams

Wordclouds

Case

Substitution

s2 <- data.frame(cbind(swpTitles$uniqueid, sapply(s2$X2,function(row) iconv(row, "latin1", "ASCII", sub=""))))


s2 <- data.frame(cbind(swpTitles$uniqueid, sapply(swpTitles$title,tolower)))
Teaching the computer to read

Building
the model

Unsupervised

Supervised
A few tools that can help

Software

Hadoop

Python

Spark
A few tools that can help

Language Package Use


Packages R library(tm) Text mapping including stopwords
R library(SnowballC) Word stemming
R library(ngram) Makes nGrams
R R library(RTextTools) Machine Learning for text data
R library(wordcloud) Makes wordclouds
Python import re Regular expressions
Python
Python import nltk Natural Language Toolkit
Python import sklearn Machine learning
Why Bother Anyway
Insights
and Inputs

Response
model

DOE for
creative

Look a like
modeling

Market
research
Rehash
We covered Application

We covered Supervised

We covered Machine Learning techniques

We covered Semi-Structured

We covered Social Media data

We covered Classification

We understand the Application of Supervised Machine Learning techniques on


Semi Structured Social Media data for Classification in plain English
Questions?

Damon.Samuel@rcggs.com