An Introduction To Social Network Data

David M Walker Data Management & Warehousing May 2012

May 2012

S
1

© 2012 Data Management & Warehousing

Hi, I’m on Facebook!

S  I’m one of 900 Million people as of May 2012 that has a

Facebook account
S  That’s more than 1 in 8 of every man, woman and child on the

planet (and the 6 crew of the International Space Station) regardless of age, race, religion, location, sexuality, etc.
S  I’ve also completed my profile – it helps my family & friends find

and communicate with me
S  It even reminds people to wish me ‘Happy Birthday’

May 2012

S
2

© 2012 Data Management & Warehousing

My Profile Page

May 2012

S
3

© 2012 Data Management & Warehousing

But what am I sharing ?

S  Depending on my privacy settings I will be sharing anything

from ‘some data’ to ‘everything about my life’

S  You can edit your privacy settings here: S  https://www.facebook.com/settings/?tab=privacy S  Remember: S  Todays ‘friends’ may not be tomorrows friends S  Sharing with family, school/work colleagues can have unexpected consequences

May 2012

S
4

© 2012 Data Management & Warehousing

How is this data used?

S  Developers use this data to ‘profile’ people S  This is both free to use and easy to do S  Uses an Application Programming Interface (API) based on

a URL

S  Jargon for ‘just connect to the website with the right options’

S  Try it: S  https://developers.facebook.com/tools/explorer

May 2012

S
5

© 2012 Data Management & Warehousing

George H Takei

S  S  S  S 

Helmsman Sulu in Star Trek (The Original Series) Gay Rights and Japanese American Internment Activist Popular Facebook Page (1,962,290 likes) and secured Basic Info
S  S 

https://developers.facebook.com/tools/explorer?method=GET&path=205344452828349 https://graph.facebook.com/205344452828349

S 

Photographs
S  S 

https://developers.facebook.com/tools/explorer? method=GET&path=205344452828349%2Fphotos https://graph.facebook.com/205344452828349/photos

May 2012

S
6

© 2012 Data Management & Warehousing

George H Takei’s photo and its data
George Takei posted this photograph API Output (Snippet):
{ "id": "373438362685623_1722672", "from": { "name": "Trevor Mullins", "id": "1024732813" }, "message": "This. So much this.", "created_time": "2012-02-09T03:43:56+0000", "likes": 3 }
May 2012

It Tells Me:
S 

Trevor Mullins was one of several hundred people who commented on this photo He did so at 03:43:56 GMT on 9th Feb 2012 Which 3 people liked the comment And from his profile: His username is Ertrov, he describes himself as “Agnostic-atheist/Antitheist”, is male, likes SiFi, and is affiliated to Sinclair Community College, Ohio and many, many more things
© 2012 Data Management & Warehousing

S 

S  S 

S
7

Back to me – My profile contains:
id: Facebook's unique reference number for me name: My Full Name username: My Username birthday: My Date of Birth hometown: Where I was born location: Where I live now employer: Who I work for employer: Who I used to work for projects: Which projects I worked on for that employer sports: Which sports I like favorite_teams: Who are my favourite teams education: Where I went to school year: And when I left type: And what type of school it was gender: My Gender relationship_status: Am I married? email: My private email website: My website timezone: My timezone locale: What language I read facebook in languages: What languages I speak verified: Have I verified my email address updated_time: When did I last update my profile type: What type of user account do I have

These are just some of the fields I could populate and developers could access
May 2012

S
8

© 2012 Data Management & Warehousing

I like …

S  If I ‘like’ a product or brand on Facebook then the owner of that

brand can use the developers interface to get information about me and others who ‘like’ their product sexual preference (‘interested in’) and location of the ‘likers’

S  For example the developer can get the age, marital status, gender, S  The developer can then look for groups of people who share the
S  This is called Cluster Analysis – looking for groups of similar

same characteristics (e.g. 18-25, single, female, straight, Liverpool)
people

May 2012

S
9

© 2012 Data Management & Warehousing

This data is valuable: Very Very Valuable
S  Once the developer has identified a ‘cluster’ of people they can

ask Facebook to advertise to others who don’t yet ‘like’ the product but share the same characteristics as those that do
S  For example, based on our previous cluster, a nightclub may want

to target adverts to similar people in their area
S  Facebook makes this very easy to do, you just go here: S  https://www.facebook.com/ads/manage/adscreator/

May 2012

S
10

© 2012 Data Management & Warehousing

Very precise targeting – know exactly who is going to see your advert

May 2012

S
11

© 2012 Data Management & Warehousing

Target audiences using their stated preferences

May 2012

S
12

© 2012 Data Management & Warehousing

Very low cost – Know exactly how much you are going to spend From an advertisers point of view this is very cost effective For Facebook – done at scale - it is very very profitable
May 2012

S
13

© 2012 Data Management & Warehousing

Dealing with the data

S  We can look at individuals manually S  We can deal with ‘small’ data sets with a spread sheet S  50,000 rows i.e. 50,000 individuals S  250 columns i.e. 250 different characteristics S  We can deal with ‘larger’ data sets with statistical tools S  There are commercial and open source tool to do the stats S  For example: ‘R’ is free and provide direct access to the Facebook API and functions to do complex cluster analysis
May 2012

S
14

© 2012 Data Management & Warehousing

Advanced Techniques

S  Exploiting the social network
S  Which of my ‘likers’ know each other? S  Is it possible to identify an individual in the group who is the

‘ring-leader’ S  Can the ring-leader be influenced towards my offering/product S  Can the ring-leader influence others to follow them?

May 2012

S
15

© 2012 Data Management & Warehousing

My Social Network

Small groups of friends that don’t know each other

Detail – Friends who know each other (initials only for confidentiality) This group all worked on a project together

A group of friends who I watch rugby with

A tight knit group of friends from where I used to work
May 2012

S
16

© 2012 Data Management & Warehousing

Sentiment Analysis

S  Analyse peoples comments and use this to change your interaction

with the you customer

S  Use feedback (positive and negative) to respond to customers –

remember you are looking for the main affect, you will always have people who have a minority opinion
S  “Don’t like the new flavour” S  “Wish the new website had a help button”

S  Simple Examples

S  There are plenty of more sophisticated examples

May 2012

S
17

© 2012 Data Management & Warehousing

Applications

S  Facebook also allows users to develop Applications S  Socialcam (54M users), Cityville (35M users) S  Texas HoldEm (35M users), DrawSomething (29M users) S  Allows users to buy virtual tokens with real money S  This in itself is a revenue generating stream S  Allows developers to place very targeted adverts S  Revenue derived from selling targeted marketing S  Allows developers to monitor social interactions for new trends S  Who do you ‘Draw Something’ with?

May 2012

S
18

© 2012 Data Management & Warehousing

Third Party Vetting

S  Looking for a new job?
S  Someone you are friends with may also know someone at your

new employer – what information will they share? S  Your social activities – don’t post that you are out partying and then call in sick S  Don’t tell the world what you think of your boss, even after you leave the organisation – you might need a reference from him or your new employer might not want to expose themselves in the future
S  Journalists looking for background
S  Those grainy news photos are often found on social websites
May 2012

S
19

© 2012 Data Management & Warehousing

Coffee with my son

S  One day I had coffee with my son, I took this photo and uploaded

it to Facebook, tagging him and adding the place
S  S  S  S  S 

S  Facebook stored the following data:

The exact date, time & GPS location of where I checked in The details of the person I was with The application on my iPhone that I used to upload the picture The people who commented, their comments and their profile And more

S  But the photograph told another part of the story …

May 2012

S
20

© 2012 Data Management & Warehousing

Photographic Data

S  Digital Cameras store data too S  This is called Metadata (data about data) S  What each device stores varies S  But you can download a free tool to read the metadata
S  http://www.sno.phy.queensu.ca/~phil/exiftool/

S  Data is stored against images, audio and video files by most

digital recording devices including cameras, phones, scanners. The data is known as EXIF data S  This data isn’t protected by your Facebook settings

May 2012

S
21

© 2012 Data Management & Warehousing

What the photo told me:

S  S  S  S  S 

File name, size and type Date and Time created GPS co-ordinates - longitude, latitude & altitude Make & Model of the device used to take the photo Technical details about the photo including focal length, exposure, whether a flash was used, etc Whether the photo has subsequently been edited and if so when and by what application Copyright information could also have be added to the image

S  S 

May 2012

S
22

© 2012 Data Management & Warehousing

What does all this add to the data stored by Facebook?
S  I can validate the date, time and location of the check-in on

Facebook
S  I can understand what type of device the user carries around S  I can understand a breach of copyright for certain materials

May 2012

S
23

© 2012 Data Management & Warehousing

What about other sites?
Facebook 900M users Qzone (China) 480M users Twitter 300M users

S  S  S  S  S  S  S  S  S 

S  S  S 

This is not a Facebook specific thing All sites allow developers to access the data Developer access is key to how organisations make money from social websites Many people put different data on different social websites Developers can use common data (e.g. an email address) to piece together an even deeper picture of an individual

Sina Weibo (China) 300M users Habbo (31 counties) 200M users Google+ 170M users Renren (China) 160M users Badoo (Europe & Latin America) 120M users Linkedin 120M user

S 

S 

May 2012

S
24

© 2012 Data Management & Warehousing

Non-social (internal) data
S  Other organisations are gathering lots of data from internal

sources rather than social networks

S  Telematics devices for car insurance S  Smart metering devices for energy consumption S  Credit card transactions for fraud detection

S  These are being manipulated and analysed using the same

techniques

S  These are the ‘Big Data’ stories you read about in the press
May 2012

S
25

© 2012 Data Management & Warehousing

Telematics Insurance

S  Buy cheap car insurance in exchange for having a ‘black box’ installed in your

car, known as a Telematics box

S  This sends data back to a central computer periodically S  Typically every couple of minutes/miles S  All the data every 100ms over a 2 second interval when there is an impact S  Minimum data set S  Longitude, Latitude, Altitude, X-Acceleration, Y-Acceleration, Z-Acceleration, Speed, Compass Direction Of Travel S  More advance units gather more data S  Camera data, Engine data, Service History, etc.

May 2012

S
26

© 2012 Data Management & Warehousing

Telematics Plot

S  Trip from Wokingham to Walton-Upon-Thames S  Rendered on Google Maps with a KML file (Free to use)
May 2012

S
27

© 2012 Data Management & Warehousing

Using Telematics Data

S  Assess customer driving pattern
S  Adjust the car insurance premium accordingly

S  Assess accidents
S  Can be used to determine fault in collisions S  Can be used to determine if whiplash is likely

S  Assess other types of car insurance fraud S  Allows insurance companies to “optimize” premiums
S  Charge as much as possible but be cheaper than the competition

May 2012

S
28

© 2012 Data Management & Warehousing

Telematics Insurers in the UK

Source: http://comparethebox.com
May 2012

S
29

© 2012 Data Management & Warehousing

Integrating Social Data and Non-Social Data
S  Organisations are starting to combine internal data with

social network data to create an even deeper understanding of the customer
S  All of the above examples given are from real projects that

we, as a company, have already been involved in

May 2012

S
30

© 2012 Data Management & Warehousing

Integrated Data

S  A youth buys cheap telematics insurance …
S  When he gets it he ‘likes’ the product on on Facebook S  Positive Sentiment Analysis – Opportunity to thank customer S  When he gets charged for the top-up miles he ‘dislikes’ the cost S  Negative Sentiment Analysis – Opportunity to address concerns S  When he has an accident and tells his mates what really happened S  Fraud detection – Opportunity to check the veracity of the claim

S  What you say and do socially now will affect your commercial

transactions in the future

May 2012

S
31

© 2012 Data Management & Warehousing

Can I Opt-Out?

S  No – you can limit your exposure but you can’t opt out of big data S  You don’t have to join social networks but:
S  Many social activities are based around Twitter/Facebook S  Most business people will want to use LinkedIn S  Peer pressure to join, especially for younger people, is high

S  Your data will be analysed by companies involved in
S  Marketing, Financial (especially underwriting & fraud), S  Energy consumption, and many more S  They will source the data internally and from social networks

May 2012

S
32

© 2012 Data Management & Warehousing

What about crime?

S 

Most uses of social data are positive
S 

Reduce fraud, improve product, more precisely targeted marketing, energy efficiency

S 

But criminals can use this technology too
S  S 

Most of the technology is either low cost or free New techniques for exploiting data evolve very quickly

S  S 

Identity theft is just one possible outcome It’s an arms race – Can we (the good guys) find ways to protect ourselves and those that share their data with us faster than the bad guys develop techniques to exploit this information? Make sure you understand what you are sharing and with whom you are sharing data

S 

May 2012

S
33

© 2012 Data Management & Warehousing

Security

S  Remember
S  Set your privacy settings on Facebook S  Things that help people communicate with you (data of birth, first

school, first pet, mothers maiden name, etc.) are also the most common security questions for online banking, etc. S  Facebook friends are not real friends – beware of ‘friending’ people you don’t actually know and ‘liking’ dubious groups S  Remember your ‘friends’ may not be so in the future or may have greater loyalties to others than they do to you S  You may get profiled and targeted as a ‘false positive’ i.e. you aren’t interested in the product/offering but match the criteria

May 2012

S
34

© 2012 Data Management & Warehousing

It’s not just social websites
S  Other sites also hold complex social information
S  Directory Websites: 192.com, company-director-check.co.uk S  Family History Websites: ancestry.co.uk, findmypast.com S  Large scale online retailers: amazon.com, apple.com, tesco.com

May 2012

S
35

© 2012 Data Management & Warehousing

Who does this work?

S  Data Scientists S  A data scientist is a job title for an employee or business intelligence (BI) consultant who excels at analysing data, particularly large amounts of data, to help a business gain a competitive edge S  The position is gaining acceptance (and significant salaries) with large enterprises who are interested in deriving meaning from big data, the voluminous amount of structured, unstructured and semi-structured data that a large enterprise produces. S  A data scientist possesses a combination of analytic, machine learning, data mining and statistical skills as well as experience with algorithms and coding. Perhaps the most important skill a data scientist possesses, however, is the ability to explain the significance of data in a way that can be easily understood by others. S  Most often Maths or Computer Studies graduates with Business skills

May 2012

S
36

© 2012 Data Management & Warehousing

Notes on this presentation

S  All trademarks and brand names are the property of their respective owners S  This presentation is designed to show capabilities, tools and techniques and is

in no way condoning or condemning any organisation, product, technology or tool

S  Other tools and products are available S  Data access may be restricted by user permissions S  Data access may be restricted by law S  Data access may be restricted by data provider terms & conditions

May 2012

S
37

© 2012 Data Management & Warehousing

Contact Us

S  Data Management & Warehousing S  Website: http://www.datamgmt.com S  Telephone: +44 (0) 118 321 5930 S  David Walker S  E-Mail: davidw@datamgmt.com S  Telephone: +44 (0) 7990 594 372 S  Skype: datamgmt S  White Papers: http://scribd.com/davidmwalker

May 2012

S
38

© 2012 Data Management & Warehousing

About Us

Data Management & Warehousing is a UK based consultancy that has been delivering successful business intelligence and data warehousing solutions since 1995. Our consultants have worked with major corporations around the world including the US, Europe, Africa and the Middle East. We have worked in many industry sectors such as telcos, manufacturing, retail, financial and transport. We provide governance and project management as well as expertise in the leading technologies.

May 2012

S
39

© 2012 Data Management & Warehousing

Thank You
©2012 - Data Management & Warehousing http://www.datamgmt.com

May 2012

S
40

© 2012 Data Management & Warehousing