Professional Documents
Culture Documents
Objectives
After the completion of this lecture you will learn about:
◇Evaluation and Marking Scheme
◇The Internet and WWW
◇Social Media
◇Usage of Social Media in Business and Marketing Purposes
◇Social Media Issues
◇Data Analytics
◇Social Media Analytics
◇Importance of Social Media Analytics
2
Evaluation
1. Evaluation methods
Evaluation Method Weight
Tests (2 @ 20%) 40%
Total 100%
2. Marking Scheme
94-100 A+ 67-69 C+
87-93 A 63-66 C
80-86 A- 60-62 C-
77-79 B+ 50-59 D
73-76 B 0-49 F
70-72 B-
3
4
The Internet
◇The Internet is a global web of computers connected to each other by
different networking media
◇It is the largest network in the world
◇It is the super information highway
◇Routers are used to connect different networks
◇No one owns the Internet
◇No organization formally manages it
◇TCP/IP protocols are used to control the communication in Internet
4
5
5
6
The World Wide Web
(WWW)
◇WWW consists of information organized into Web pages
containing text and graphic images.
◇Different web resources are identified by Uniform Resource
Locators (URLs).
◇A website is a collection of linked web pages that has a
common theme or focus.
◇The main page is called the home page.
◇Websites are accessed using companies LAN or using an
Internet Service Provider
6
7
Different Terminologies
◇Browser
◇Search Engine
◇URL
◇Domain
◇Hypertext Markup Language
◇Client/Server Architecture
◇IP Addressing
◇HTTP
◇HTTPS
7
8
9
10
10
11
11
12
12
13
What is Data?
◇Data is the raw material
◇Data is transformed into information and ultimately knowledge
◇Value creation:
13
14
Data Analytics
◇It is a process of inspecting, cleansing, transforming, and modeling data with the goal of
discovering useful information, suggesting conclusions, and supporting decision making
14
15
15
16
17
18
18
19
Conclusion
We have studied
◇The evolution of Internet and WWW
◇Characteristics of Social Media
◇Usage of Social Media for Business
◇Data Analytics
◇Social Media Analytics
◇Importance of Social Media Analytics
19
20
Review Questions
◇What are the don’ts of social media?
◇What is the difference between web 1.0 and web 2.0?
◇Compare Facebook, Twitter and Instagram in terms of total
number of users, total posts created/shared, photos uploaded,
etc.
◇ Which networking device do we use to connect different
networks?
◇Who is the owner of the Internet?
◇Which protocol suites define how communication should be
done?
◇What is WWW? 20
21
Review Questions..
◇How is a web resource identified in WWW?
◇What is HTML?
◇What is the difference between HTTP and HTTPS?
◇What are some dos of social media?
◇Which social media is used for Microblogging?
◇Which social media platform is used for discussion
forums?
◇What are the security issues for social media?
21
22
Review Questions..
◇How do you create value from data?
◇What are the steps of data analytics?
◇What are the different social media analytics tasks?
◇What do you use to analyze a social media platform?
◇How do you monitor social media?
◇Why do you want to measure sentiment?
◇Why do you want to understand your audiences?
22
23
Next session..
23
Lambton College
School of Computer Studies
Objectives
After the completion of this lecture you will learn about:
◇ Social Media Value Creation Model (VCM)
◇ Social Media Return on Investment (ROI)
◇ Social Media Value Metrics to measure ROI
◇ Social Media Engagement (SME) Metrics
◇ Social Media Influence (SMI) Metrics
◇ Social Media Popularity (SMP) Metrics
Labs
◇ BMI Analysis
◇ Normalization
◇ Parsing
25
26
Social Media Analytics Value Creation
Model
◇ Research by MIT Sloan Management found that 67% of the respondents
of a survey reported that by employing analytics their companies
gained a competitive advantage
◇ Another study found that 56% of the marketers do not know how to
incorporate social media with their business outcomes
◇ Creating value with social media analytics entails harnessing cost-
effective and commercially worthy insights from social media semi-
structured and unstructured data that can ideally lead to competitive
advantage
26
27
Social Media Analytics Value Creation
Model..
The value creation process is done through a set of activities:
◇ Defining what the value is
◇ Aligning the value creation with business objectives
◇ Capturing the value using analytics
◇ Sustaining the value for a long period
◇ As the support activities to the value creation model analytics
infrastructure is used
27
28
Social Media Analytics Value Creation
Model..
28
29
29
30
30
31
31
32
Non-tangible V2F
◇ Brand awareness
◇ Brand name
◇ Brand loyalty
◇ Customer engagement
◇ Mass collaboration
◇ Crowd-sourcing
◇ Idea generation
◇ Connectivity
◇ Customer Satisfaction
◇ Website traffic, etc.
32
33
Non-tangible V2C
◇ Product awareness
◇ Brand association
◇ Brand Connectivity
◇ Brand involvement
◇ Service quality
◇ Information quality
◇ Product quality, etc.
33
34
34
35
Tangible V2C
◇ Discounts
◇ Competitive price
◇ Group buying
◇ Social buying
◇ Volume discount
◇ Promotions
◇ Low transaction cost
◇ Savings
◇ Easy buying, etc.
35
36
36
37
37
38
38
39
Value metric: Social Media Engagement
(SME)
◇ It is the overall responsiveness and interaction of a brand with its
customers through social media
◇ It is the communication connections between brands and its stakeholders
through various social media channels
◇ The essential idea of SME is to encourage customers to interact and
share their experiences
◇ Active social media engagement means how many of the total followers
are actively interacting with the content
◇ Since social media platforms vary significantly regarding scope,
engagement, and use, SME metrics must be platform specific
39
40
40
41
41
42
42
43
Conclusion
We have studied
◇ Social Media Value Creation Model (VCM)
◇ Social Media Return on Investment (ROI)
◇ Social Media Value Metrics to measure ROI
◇ Social Media Engagement (SME) Metrics
◇ Social Media Influence (SMI) Metrics
◇ Social Media Popularity (SMP) Metrics
43
44
Review Questions
◇What activities are done to create value?
◇State an example of a non-financial value for a firm?
◇State an example of a non-financial value for a customer?
◇State an example of a financial value for a customer?
◇What are the categories of value?
◇ State some examples of tangible V2F
◇ State some examples of non-tangible V2F
◇ State some examples of non-tangible V2C
44
45
Review Questions..
◇ State some examples of tangible V2C
◇State the difference between organic and non-organic ads.
◇How do get social medial returns?
◇What could be some SME metrics for Facebook?
◇What could be some SME metrics for Twitter?
◇What is the difference between SME and SMI?
◇Which metrics do we use for SMI measurement?
◇Why do we use SMP for? For Twitter, what metrics can be used to
measure SMP?
45
46
Next session..
46
Lambton College
School of Computer Studies
Objectives
After the completion of this lecture you will learn about:
◇ Social Media Analytics
◇ The difference of Social Media Analytics and Business Analytics
◇ Different types of Social Media Analytics
◇ Challenges to Social Media Analytics
Labs
◇ Parsing
◇ Normalization
◇ One Hot Encoding Vector
◇ Veracity Calculation
48
49
49
50
What is Social Media
Analytics?..
Social Media Analytics “is the art and science of
extracting valuable hidden insights from vast amounts of
semi-structured and unstructured social media data to
enable informed and insightful decision making.”
art: It is an art of interpreting and aligning the insights
gained with business goals and objectives.
50
51
What is Social Media
Analytics?..
Social Media Analytics “is the art and science of
extracting valuable hidden insights from vast amounts of
semi-structured and unstructured social media data to
enable informed and insightful decision making.”
science: It is a science, as it involves systematically
identifying, extracting, and analyzing social media data
using sophisticated tools and techniques.
51
52
What is Social Media
Analytics?..
Social Media Analytics “is the art and science of
extracting valuable hidden insights from vast amounts of
semi-structured and unstructured social media data to
enable informed and insightful decision making.”
extracting: The data needs to be mined from public
databases
52
53
What is Social Media
Analytics?..
Social Media Analytics “is the art and science of
extracting valuable hidden insights from vast amounts
of semi-structured and unstructured social media data to
enable informed and insightful decision making.”
valuable hidden insights: It carries business value and
are not usually visible to the naked eyes.
53
54
What is Social Media
Analytics?..
Social Media Analytics “is the art and science of
extracting valuable hidden insights from vast amounts
of semi-structured and unstructured social media data
to enable informed and insightful decision making.”
vast amounts of semi-structured and unstructured:
The data has four Vs properties: volume, variety, velocity
and veracity.
54
Copyright © 2018 Gohar F. Khan
55
55
56
56
57
57
58
Social Media Analytics vs Business
Analytics
Table 1. Social media vs. conventional business analytics (Khan, 2015)
58
59
SMA related terms (Google
trends)
59
60
60
61
61
62
Descriptive Analytics
◇ Descriptive analytics are carried out to answer
questions about events that have already occurred
Sample questions can include:
◇ What was the sales volume over the past 12 months?
◇ What is the number of support calls received as
categorized by severity and geographic location?
◇ What is the monthly commission earned by each sales
agent?
62
63
Descriptive Analytics..
◇ It is estimated that 80% of generated analytics results
are descriptive in nature
◇ Value-wise, descriptive analytics provide the least
worth and require a relatively basic skillset
63
64
Diagnostic Analytics
◇ Diagnostic analytics aim to determine the cause of a
phenomenon that occurred in the past using questions that focus
on the reason behind the event (Root Cause analysis)
Example questions include:
◇Why were a specific sales less than the other sales?
◇ Why have there been more support calls originating from the
Eastern region than from the Western region?
◇Why was there an increase in patient re-admission rates over
the past
64
65
Predictive Analytics
◇ Predictive analytics are carried out in an attempt to determine the outcome of
an event that might occur in the future
◇Predictive analytics try to predict the outcomes of events, and predictions are
made based on patterns, trends and exceptions found in historical and current
data.
Example questions include:
◇ What are the chances that a customer will default on a loan if they have
missed a monthly payment?
◇ What will be the patient survival rate if Drug B is administered instead of
Drug A?
◇ If a customer has purchased Products A and B, what are the chances that
they will also purchase Product C?
65
66
Prescriptive Analytics
◇ It answers the questions of what should we do? Why
should we do it?
◇ It is used when we have important, time-sensitive or
complex decisions to make
◇ For example, to offer incentive to customers who are
likely to leave your business
66
67
Summary: Types of Social Media
Analytics
67
68
68
Case Study – Price
69
Prediction
◇ Situation: A person brings her car to a car dealership
to sell
◇ Task: The dealership wants to estimate the price of
the car
◇ We Makecan predict
Model
the Millage
Year
price of theHorsepower
Color
car using the price of
Sold Price
previousToyotacars
Camry 2011 150K White 180 $15,000
BMW X3 2015 25K Blue 270 $35,000
Honda Civic 2005 350K Black 145 $3,000
Ford F150 2007 200K Red 220 $16,000
69
70
Working with Categorical Data: One-hot
Encoding
◇ Many algorithms cannot work with categorical data directly
◇ The categories must be converted into numbers
◇ A one hot encoding is a representation of categorical variables
as binary vectors
◇ This first requires that the categorical values be mapped to
integer values
◇ Then, each integer value is represented as a binary vector
that is all zero values except the index of the integer, which is
marked with a 1
70
71
71
72
Conclusion
We have studied
◇ Social Media Analytics
◇ The difference of Social Media Analytics and Business
Analytics
◇ Different types of Social Media Analytics
◇ Challenges to Social Media Analytics
◇ One-hot encoding vector
72
73
Review Questions
◇What are the challenges of Social Media Analytics?
◇What is meant by veracity?
◇When do we use one-hot encoding vectors?
◇What is the difference between SMA and BA in terms of data
type?
◇ What do you want to calculate to find the quality of the dataset?
◇ What type of social media analytics you are going to use to
answer questions such as "What happened?"
◇What type of social media analytics you are going to use to
answer questions such as "Why did it happen?"
73
74
Review Questions..
◇ What type of social media analytics you are going to use to
answer questions such as "What will happen?"
◇ What type of social media analytics you are going to use to
answer questions such as "What do I do?“
◇ What is association analysis?
74
75
Next session..
75
Lambton College
School of Computer Studies
Objectives
After the completion of this lecture you will learn about:
◇ The Seven Layered approach to know your customer
◇ Eight layers of social media analytics
◇ Social Media Monitoring
◇ Social Media Listening
◇ Social Media Analytics Cycle
Labs
◇ Matrix Multiplication
◇ Traditional Matrix Factorization
◇ Neural Embedding based Matrix Factorization
77
78
A Seven Layer approach to Know your
customer
◇ Where are they? Location Analytics
◇ What they say? Text Analytics
◇ What they do? Actions Analytics
◇ What they search? Search Engine Analytics
◇ How they network? Network Analytics
◇ How they navigate? hyperlink Analytics
◇ How they use apps? Apps Analytics
78
79
Multimedia Text
Search Hyperlink
Engines s
Location
79
80
80
81
81
82
82
83
83
84
84
85
85
86
86
87
87
88
88
89
89
90
90
91
91
92
Interpretation Extraction
Business
Objectives
Visualization Cleaning
Analyzing
93
94
94
95
Identification
Framing the right question and knowing what data to analyze is
extremely crucial in gaining useful business insights
◇ Searching and identifying the right source of information for
analytical purposes
◇ Type of data needed (text, actions, search engine)
◇ Source of data (Twitter, Facebook)
◇ Public data vs., company owned source
95
96
Extraction
Methods, tools, and skills needed to extract the data.
◇ API based Extraction: In simple words, are sets of
routines/protocols that social media service companies (e.g.,
Twitter and Facebook) have developed that allow users to
access small portions of data hosted in their databases.
96
97
Cleaning
This step involves removing the unwanted data from the
automatically extracted data.
◇ Missing values
◇ Incorrect/false data
◇ Coding (e.g., Gender)
◇ Anonymize
97
98
Analyzing
◇ Depending on the layer of social media analytics under
consideration and the tools and algorithm employed, the steps
and approach to take will greatly vary.
◇ The overall objective at this stage is to extract meaningful insights
without the data losing its integrity
98
99
Visualization
Effective visualization is particularly helpful with complex and
large data sets because it can reveal hidden patterns,
relationships, and trends.
◇ Network data (with whom)
◇Topical data (what)
◇Temporal data (when)
◇Geospatial data (where)
99
100
Conclusion
We have studied
◇ The Seven Layered approach to know your customer
◇ Eight layers of social media analytics
◇ Social Media Monitoring
◇ Social Media Listening
◇ Social Media Analytics Cycle
100
101
Review Questions
◇ What is the seven layer approach to know your customers?
◇What are the eight layers of social media analytics?
◇What kind of business insights can you extract from textual data?
◇What kind of business insights can you extract from network
data?
◇Which layer of social media analytics does engagement tracking?
101
102
Review Questions..
◇ Which layer of social media analytics is known as spatial
analysis?
◇ Which layer of social media analytics includes advertisement
history to complete the analysis?
◇ What is the difference between social media monitoring and
listening?
◇ In social media analytics life cycle, what is done in the
identification step?
102
103
Next session..
103
Lambton College
School of Computer Studies
Objectives
After the completion of this lecture you will learn about:
◇ Value creation strategies to get value out of data
◇ Data cleaning techniques
◇Techniques to visualize data
Labs
◇ Traditional Matrix Factorization
◇ Neural Embedding based Matrix Factorization
105
Value creation stages
The value creation steps can be divided into the following nine
stages:
◇Business Case Evaluation
◇Data Identification
◇Data Acquisition & Filtering
◇Data Extraction
◇Data Validation & Cleansing
◇Data Aggregation & Representation
◇Data Analysis
◇Data Visualization
◇Utilization of Analysis Results
106
Value creation stages..
107
108
Data Identification
◇ The Data Identification stage is dedicated to identifying the datasets
required for the analysis project and their sources.
◇ Identifying a wider variety of data sources may increase the probability of
finding hidden patterns and correlations. For example, to provide insight, it
can be beneficial to identify as many types of related data sources as
possible, especially when it is unclear exactly what to look for.
◇ Depending on the business scope of the analysis project and nature of
the business problems being addressed, the required datasets and their
sources can be internal and/or external to the enterprise.
108
109
Data Identification..
◇In the case of internal datasets, a list of available datasets from
internal sources, such as data marts and operational systems,
are typically compiled.
◇ In the case of external datasets, a list of possible third-party
data providers, such as data markets and publicly available
datasets, are compiled. Some forms of external data may be
embedded within blogs or other types of content-based web
sites, in which case they may need to be harvested via
automated tools.
109
110
Data Acquisition and
Filtering
◇During the Data Acquisition and Filtering stage, the data is
gathered from all of the data sources that were identified during
the previous stage.
◇The acquired data is then subjected to automated filtering to
fetch the related data for the analysis
◇ Depending on the type of data source, data may come as a
collection of files, such as data purchased from a third-party data
provider, or may require API integration, such as with Twitter.
110
111
Data Acquisition and
Filtering..
◇ In many cases, especially where external, unstructured data is
concerned, some or most of the acquired data may be irrelevant
(noise) and can be discarded as part of the filtering process.
◇ Data classified as “corrupt” can include records with missing
or nonsensical values or invalid data types. Data that is filtered
out for one analysis may possibly be valuable for a different type
of analysis.
◇ Therefore, it is advisable to store a verbatim copy of the
original dataset before proceeding with the filtering. To minimize
the required storage space, the verbatim copy can be
compressed.
111
112
Data Acquisition and
Filtering..
◇ Both internal and external data needs to be persisted once it gets
generated or enters the enterprise boundary.
◇ For batch analytics, this data is persisted to disk prior to analysis. In
the case of real time analytics, the data is analyzed first and then
persisted to disk.
◇ Metadata can be added via automation to data from both internal
and external data sources to improve the classification and querying
◇ Examples of appended metadata include dataset size and
structure, source information, date and time of creation or collection
and language-specific information.
112
113
Extraction
113
114
Data Extraction
◇ Some of the data identified as input for the analysis may arrive in
a format incompatible with the solution model. The need to address
disparate types of data is more likely with data from external
sources.
◇ The Data Extraction stage, is dedicated to extracting data and
transforming it into a format that the underlying solution can use for
the purpose of the data analysis
◇ The extent of extraction and transformation required depends on
the types of analytics and capabilities of the solution. For example,
extracting the required fields from delimited textual data, such as
with webserver log files, may not be necessary if the underlying
Data solution can already directly process those files.
114
115 Data Validation and
Cleansing
◇ Invalid data can falsify analysis results. Unlike
traditional enterprise data, where the data structure is
pre-defined and data is pre-validated, data for social data
analytics can be unstructured without any indication of
validity. Its complexity can further make it difficult to
arrive at a set of suitable validation constraints.
◇ The Data Validation and Cleansing stage is dedicated
to establishing often complex validation rules and
removing any known invalid data.
115
116 Data Aggregation and
Representation
◇ Data may be spread across multiple datasets,
requiring that datasets be joined together via common
fields, for example date or ID. In other cases, the same
data fields may appear in multiple datasets, such as date
of birth. Either way, a method of data reconciliation is
required or the dataset representing the correct value
needs to be determined.
◇ The Data Aggregation and Representation stage,
shown in the next figure, is dedicated to integrating
multiple datasets together to arrive at a unified view.
116
117 Data Aggregation and
Representation..
117
118
Data Aggregation and
Representation..
Performing this stage can become complicated because
of differences in:
◇ Data Structure
◇ Semantics – A value that is labeled differently in two
different datasets may mean the same thing, for example
“surname” and “last name.”
118
119
Data Analysis
◇ The Data Analysis stage is dedicated to carrying out the actual analysis
task, which typically involves one or more types of analytics.
◇ This stage can be iterative in nature, especially if the data analysis is
exploratory, in which case analysis is repeated until the appropriate pattern
is retrieved.
◇ Depending on the type of analytic result required, this stage can be as
simple as querying a dataset to compute an aggregation for comparison. On
the other hand, it can be as challenging as combining data mining and
complex statistical analysis techniques to discover patterns or to generate a
statistical or mathematical model to depict relationships between variables.
◇ Data analysis can be classified as confirmatory analysis or exploratory
analysis, the latter of which is linked to data mining
119
120
Data Analysis…
120
121
Data Analysis…
◇ Confirmatory data analysis is a deductive approach
where the cause of the phenomenon being investigated
is proposed beforehand. The proposed cause or
assumption is called a hypothesis.
◇ The data is then analyzed to prove or disprove the
hypothesis and provide definitive answers to specific
questions. Data sampling techniques are typically used.
121
122
Data Analysis…
◇ Exploratory data analysis is an inductive approach that
is closely associated with data mining. No hypothesis or
predetermined assumptions are generated.
◇ Instead, the data is explored through analysis to
develop an understanding of the cause of the
phenomenon. Although it may not provide definitive
answers, this method provides a general direction that
can facilitate the discovery of patterns or anomalies.
122
123
Data Visualization
◇ The ability to analyze massive amounts of data and find useful
insights carries little value if the only ones that can interpret the
results are the analysts.
◇ The Data Visualization stage is dedicated to using data
visualization techniques and tools to graphically communicate the
analysis results for effective interpretation by business users.
◇ Business users need to be able to understand the results in
order to obtain value from the analysis and subsequently have
the ability to provide feedback, as indicated by the dashed line
leading from stage 8 back to stage 7
123
124
Data Visualization
◇ The results of completing the Data Visualization stage provide
users with the ability to perform visual analysis, allowing for the
discovery of answers to questions that users have not yet even
formulated
◇ The same results may be presented in a number of different
ways, which can influence the interpretation of the results.
Consequently, it is important to use the most suitable
visualization technique by keeping the business domain in
context.
◇ Another aspect to keep in mind is that providing a method of
drilling down to comparatively simple statistics is crucial, in order
for users to understand how the rolled up or aggregated results 124
125
126
127
128
129 Data cleaning: How to Handle Missing
Values?..
◇ Global Estimation
the attribute mean/median for numeric attributes
the most probable value for symbolic (i.e. categorical) attribute
◇ Local Estimation:
the attribute mean/median for all the tuples belonging to the same
class (for numeric attributes)
the most probable value within the same class (for symbolic
attributes)
129
130
130
131
Cluster Analysis
131
132
Regression Analysis
◇ Fit the data to a function
◇ Data points too far away from the function are outliers.
132
133
Conclusion
We have studied
◇ Value creation strategies to get value out of data
◇ Data cleaning techniques
◇ Techniques to visualize data
133
134
Review Questions
◇How does the process of value creation starts?
◇What is done in the data extraction process?
◇From what stage, we go back to data analysis stage
again?
◇What is a confirmatory analysis?
◇What is exploratory analysis?
◇What is a model?
134
135
Review Questions
◇While classifying, if a class label is missing then what
do we do with that tuple?
◇What is global estimation?
◇What is local estimation?
◇Example of supervised learning technique.
◇Example of unsupervised learning technique.
135
Lambton College
School of Computer Studies
Objectives
After the completion of this lecture, you will learn about:
◇ Text Analytics
◇ Types of Social Media Text
◇ Reasons for Text Analytics
◇ Steps of Text Analytics
Classwork:
◇ Please read the paper on Neural Collaborative Filtering
◇ Create a summary slide and present it to the class
Lab: LDA Modeling
137
80% of Data is Unstructured
◇Database notes:
Call center transcripts
Other CRM
◇Email
◇Open-ended survey responses
◇Web pages
◇News Groups
◇Documents themselves
◇Competitive information
◇Reviews, tweets, comments
◇Photos, Videos, info graphics
138
139
139
140
141
142
Text Industry
142
143
Deployment Models
◇ On-premise model
• It is comparatively expensive option, but provides extra security and
control
143
144
Key Players
◇IBM (Watson Analytics)
◇Lexalytics
◇Microsoft (Text analytics APIs)
◇SAS Text
◇SAP
144
145
Applications for Text
Analytics
◇ Social media data
◇Surveys
◇‘Reading’ email
◇Call centre data
◇Abstracts
◇Document management
◇Corporate history
◇Scientific publications
◇Thematic understanding of website
◇Database notes
145
146
Perpose of
Concept Trends
Mining Social Media Mining
Text Analytics
Intention
Mining
146
147
147
148
148
149
149
150
150
Copyright © 2018 Gohar F. Khan
Business UI
Extraction
Operational Fraud
Systems
Attributes
Prediction
Customer Data Expert UI Business
Data Collection User
152
153
153
154
Source
Text Mining Identification
Text Parsing
Text
&
Transformation
Filtering
•Terms count •Stemming
•Frequency count •Parts of speech
•Co-occurence metrics •Named entities
extraction
•Stop words
•Filtering
154
155
155
156
156
157
157
158
Conclusion
We have studied
◇ Text Analytics
◇ Types of Social Media Text
◇ Reasons for Text Analytics
◇ Steps of Text Analytics
158
Lambton College
School of Computer Studies
Objectives
After the completion of this lecture, you will learn about:
◇ What is a recommender system?
◇ How does a recommender system work?
◇ Different types of Recommender Systems
◇ Collaborative Filtering Based Recommender System
◇ Content-based Recommender System
◇ Hybrid Recommender System
◇ Text Mining and Recommender Systems
160
Recommender System
◇ Recommender systems (RS) help to match users with items to
ease information overload
◇ They are information filtering systems that recommend items to
different users based on users’ preferences, interests and behaviors
◇ For example: Related movie recommendation in Netflix
161
Recommender Systems..
◇ Recommender systems reduce information overload by
estimating relevance
162
Types of Recommender Systems
◇ Collaborative filtering (CF) based Recommender System:
Recommends items by identifying other users with similar
interest and it uses a rating matrix
◇ Content based (CT) Recommender System: Recommends
items based on the content similarity. Different features of items
are used as the side information to design the recommender
system
◇ Knowledge-based Recommender Systems: Recommends
items based on explicit knowledge about the items and user
preferences
◇ Group Recommender Systems: Recommends items
depending on the group attachment for a user
◇ Hybrid Recommender System: combines CF and CT based
techniques
163
Collaborative Filtering Based Recommender
Systems
◇ Tell me what's popular among my peers
164
Content-Based Recommender Systems
◇ Show me more of the same what I've liked
165
Knowledge-based Recommender Systems
◇ Tell me what fits based on my needs
166
Group Recommender Systems
◇ Here recommendation is provided depending on the group
attachment for a user
167
Hybrid Recommender Systems
◇ Hybrid recommender systems combine two or more
recommendation techniques in order to increase the overall
performance
◇ The main idea is using multiple recommendation techniques to
suppress the drawbacks of an individual technique in a combined
model
168
Hybrid Recommender Systems..
169
170
170
171
Types of Feedback
◇ Explicit Feedback:
○ Explicit ratings (usually 1 to 5) given by users
◇ Implicit Feedback:
○ Clicks, page views, time spent on page, downloads …
○ Can be used when explicit feedback is missing or in addition to explicit
ones
○ Social Network shares (impressions/likes/replies/etc.)
○ Interaction Data (Comment/Like/Share)
171
172
Calculation of Implicit Feedback
Score
◇ FSui = W L * totalLike + W C * totalComment + W s * totalShare+ W i*
impressions + W tc* totalClickthrough + W rts* reTweetreShareScore + …
◇ Here:
FSui = Feedback score for one user on one item
W i = Weight for impression
W L = Weight for likes
W c = Weight for comments
W s = Weight for shares
W tc = Weight for ClickThrough
W rts = Weight for tweeter retweet and facebook reshare
172
173
Matrix Factorization
◇ Given a n * m matrix R with some entries unknown
o n rows represent n users
o m columns represent m items
o Entry 𝑅𝑖𝑗 represents the i-th user’s rating on the j-th item
◇ We are interested in the unknown entries’ possible values
o i.e., predict users’ ratings
◇ We can model the problem as R = U * V
◇ Here, U and V are two feature matrices for users and items
173
174
Content-based recommendation
◇ Collaborative filtering does NOT require any information about the items.
It just focuses on the Users’ feedback
○ In our calculation if we can include item’s information, then we would
be able to recommend items based on their contents. If a user likes
Soccer, then we will be able to recommend soccer related posts.
◇ The objective here is:
o To build user’s profile based on the contents of the items the user has
interacted before
o To recommend items those are similar to a user’s profile
174
175
175
176
Text Mining
◇ It seeks for novel and useful patterns in the data
◇ It mostly deals with unstructured data: word documents, PDF files, text
excerpts, etc.
◇ To perform text mining - first, impose structure to the data, then mine the
structured data
◇ Benefits of text mining are obvious especially in text-rich data
environments
○ law (court orders), academic research (research articles), finance
(quarterly reports), medicine (discharge summaries), biology (molecular
interactions), technology (patent files), marketing (customer comments),
etc.
◇ Recommend documents, Spam filtering, e-mail prioritization,
etc. 176
177
Text Representation
◇ A document is represented using a feature vector
◇ A collection of documents is called a corpus
◇ A document is composed of individual tokens or terms
◇ Each document is one instance
◇ Bag of Words:
○ Treat every document as just a collection of individual words
○ Ignore word order, grammar, structure of the sentence and punctuations
○ Each word in the document is treated as a keyword of the document
○ Each feature is represented by a one or a zero
177
178
178
179
◇ The IDF of a term shows how significant that term is in the entire collection of
documents, rarer words get higher IDF score
179
180
TF-IDF
◇ A model to represent a document
180
181
N-gram Sequences
◇ To preserve word order, we use N-gram sequences
◇ Includes sequences of adjacent words as term. Adjacent pairs are commonly called bi-
grams or 2-grams
◇ Example: “Text mining finds hidden patterns” will be transformed into:
◇ {text, mining, finds, hidden, patterns, text_mining, mining_finds, finds_hidden,
hidden_patterns}
◇ Can you find any 3-grams here?
◇ Do you see any problems here?
181
182
N-gram..
◇ Which one is a bi-gram for the following sentence:
“She walks very fast”
1. {she, fast}
2. {walks, fast}
3. {she, very}
4. {walks, very}
182
183
Profile Similarity For
Recommendation
◇ Now, if a user reads a document d, then we can find the similar documents by using
different similarity finding algorithms. For example, we can use cosine similarity to find the
similar documents and recommends them to the user
◇ Cosine Similarity is a similarity formula to measure similarity of two
posts/articles. It is also known as vector-based similarity.
183
184
Conclusion
We have studied
◇ What is a recommender system?
◇ How does a recommender system work?
◇ Different types of Recommender Systems
◇ Collaborative Filtering Based Recommender System
◇ Content-based Recommender System
◇ Hybrid Recommender System
◇ Text Mining and Recommender Systems
184
185
Review Questions
◇What is the input to a collaborative filtering-based recommender system?
◇What is the input to a content-based recommender system?
◇What is the input to a hybrid recommender system?
◇When is a group recommender system beneficial?
◇What is done to apply matrix factorization to a User-Item rating Matrix R?
◇Difference between explicit feedback and implicit feedback?
◇ Say, you collect comments given by the users on a movie. You calculate
and use the sentiment score as the rating for that movie. Here, this rating is
implicit or explicit feedback?
185
186
Review Questions..
◇What is the difference between N-gram sequences and "Bag of Words"
approach?
◇What is a problem for N-gram sequences?
◇What is a benefit of using N-gram sequences?
◇What does cosine similarity algorithm do?
186
Lambton College
School of Computer Studies
188
189
189
190
190
191
191
192
Common Network Terms:
Network
◇ At a very basic level, a network is a group of nodes that
are connected with links
◇ Nodes can represent:
• Individuals
• Organizations
• Countries
• Computers
• Websites, etc.
192
193
Common Network Terms:
Network..
◇ Links represent the relationship among the nodes
• Friendship
• Trade
• Authorship
• Hyperlinks, etc.
193
194
Common Network Terms: Social
Networks
◇ A social network is a group of nodes and links formed by
social entities where nodes can represent social entities
such as people and organizations
◇ Real World Social Networks: A network among classmates
is an example of real-world social network.
◇ Online Social Networks: A Twitter follow-following network is
an example of an online social media network.
194
195
Common Network Terms: Social Network
Site
◇ A social network site is a special-purpose software (or social
media tool) designed to facilitate the creation and maintenance of
social relations
◇ Examples:
• Facebook,
• Google+,
• LinkedIn, etc.
195
196
Common Network Terms: Social
Networking
◇ The act of forming, expanding, and maintaining social
relations is called social networking
◇ Using social network sites, users can, for example, form,
expand, and maintain online social ties with family, friends,
colleagues, and sometimes strangers
196
197
Common Network Terms: Social Network
Analysis
◇ Social network analysis is the science of studying and
understanding social networks
◇ Has root in variety of fields:
• Graph Theory
• Sociology
• Information Science
• Communication Science
197
198
Network Structure
◇ Variety of network structures exist:
• Random network
• Scale-free network
• Centralized networks, etc.
◇We have different structures for network for node degree
distribution
◇ Degree of a node measures the number of links a node has to
the other nodes in a network
198
199
Network Structures based on Degree
distribution
◇ Degree distribution is the probability distribution of
nodes degrees over the whole network
◇ Degree distribution tries to capture the difference in the
degree of connectivity between nodes in a graph
199
200
Random Network
◇ A network with normal distribution or homogeneous degree
distribution (does not have distinct pattern).
200
201
Scale-free network
◇ In a scale-free network, few
central nodes control the follow
of data
◇ Facebook and Twitter Social
networks (few people with many
connections)
◇ Few websites having more
in-links (Google and Yahoo).
◇ Citation networks (few Banking activity network; nodes size represent
financial assets and links represent flow of capital.
scholars with many citations);
201
202
202
203
203
204
204
205
Types of Networks
◇ The networks can be classified in a variety of ways:
■ based on existence,
■ based on direction of links,
■ based on mode, and
■ based on weights.
205
206
Types of Networks..
206
207
207
208
208
209
209
210
210
211
211
212
212
213
Degree Centrality
◇ Degree centrality of a node measures the number of links a node has to the other
nodes in a network.
◇ In a Facebook network, for example, this will measure the number of friends that a
member has.
◇ In a Twitter network, it will equate to the number of followers or following a user has.
◇ In a directed network, the degree can be either in-degree (followers) or out-degree
(following).
213
214
Betweenness Centrality
◇ Betweenness Centrality of a node is related to the centrality (or position) it has in a
network. Nodes with high betweenness centrality can control the flow of information
between connected nodes due to their central position in the network.
214
215
Betweenness Centrality
215
216
Eigenvector Centrality
◇ Eigenvector Centrality of a node measures the importance of a node
based on its connections with other important nodes in a network. It can
provide an understanding of a node’s networking ability relative to that of
others
E.g., Google search
engine use eigenvectors
to rank website based
on importance of in-links
216
217
217
218
218
219
Density
◇ The density of a network deals with a number of links in a
network. It is number of links present in a network divided by the
number of all possible links
◇ (for an undirected network, the number of all possible links
can be calculated as n (n – 1)/2); where n is the number of nodes
in a network).
◇ A fully connected network, in which each node is connected to
every other node, will have a density of 1.
219
220
Density
n (n – 1)/2); where n is the number of nodes in a network).
220
221
Network Components
◇ Components of a network are the isolated sub-networks that connect within but are
disconnected between, sub-networks (Hanneman and Riddle 2005).
221
222
Diameter
◇ The diameter of a network is the largest of all the
calculated shortest path between any pair of nodes in a
network (Wasserman and Faust 1994).
◇ It can provide an idea of how long it would take for
some information/ideas/message to pass through the
network
222
Lambton College
School of Computer Studies
Objectives
◇ To understand Action Analytics
◇ Describe different types of actions
◇ To understand Search Engine Analytics
◇ Search Engine KPIs
224
225
225
226
226
227
227
228
228
229
229
230
230
231
231
232
232
233
233
234
Search Engine Optimization
(SEO)
◇ Search Engine Result Page (SERP) generally have two types
of results:
o Organic or free: Social Presence, Social Content,
Bookmarking, Keywords, Meta tags, etc.
o Nonorganic search results: Google AdWords, Bing Ads,
Facebook Ads, etc.
◇SEO increases Website Traffic
234
235
Search Engine Optimization
(SEO)..
235
236
236
237
237
238
238
239
How to get listed on the top
(free)?
◇ Hundreds of “ranking factors”
◇ Mostly ranking happen based on relevance and
popularity:
Keywords:
o Title page keywords/meta tags
o Keyword rich URL structure
Website structure and high-quality contents
Social media presence/profiles: Fan pages, blogs,
Wikipedia articles, Twitter, etc.
239
240
240
241
241
242
Website Keywords and Meta
tags
242
243
243
244
244
245
Conclusion
◇ To understand Action Analytics
◇ Describe different types of actions
◇ To understand Search Engine Analytics
◇ Search Engine KPIs
245
Session-10:
Capturing Value
with Location and
Mobile Analytics
Introduction
Postal Address
Latitude and Longitude
GPS-Based
IP-Based
Bluetooth/Sensors/Beacons/
Categories of Location Analytics
Business data-driven location analytics deals with mapping, visualizing, and mining location data to reveal
patterns, trends, and relationships hidden in tabular business data (e.g., sales by regions, states, country).
Categories of Location Analytics
1. Powerful Intelligence
2. Geo-Enrichment
3. Collaboration and Sharing
Powerful Intelligence
Using sophisticated mapping techniques, such as heat mapping,
data aggregation (e.g., aggregating data to regions), and color-
coded mapping, can generate powerful business intelligence.
Powerful Intelligence
1. freedom of movement
2. freedom from being observed
What Is Mobile Analytics?
Apps Analytics
Apps analytics deal with understanding and
analyzing mobile application users’ characteristics,
actions, and behaviors.
Native Apps
Native apps are specifically created
for and installed on mobile devices.
Pros: Cons:
• Available in apps • Are device specific.
stores. • Expensive to develop.
• Easy to monetize • Can only be accessed
• Fast through specific mobile
• Adjust well to their devices.
native platforms
Copyright © 2018 Gohar F. Khan
Types of Apps
Web-Based Apps
Web-based apps look like natives apps, but in reality
they are websites optimized for mobile access.
Pros: Cons:
• Less costly to develop & • Not available in App stores
maintain.
• Slow
• Device independent
• Can be accessed from any • Hard to monetize them
mobile device.
• Can be accessed through
Internet browsers.
Copyright © 2018 Gohar F. Khan
Types of Apps
Hybrid Apps
A hybrid app combines the functionalities of
both native and web-based apps.
• Always On
• Moveable
• Location Awareness
• Focused
• Personalization
• Short-Term Use
• Easy to Use
Three Strategies
Do-It-Yourself
Outsource It
Go Open Source
In-Links
Out-Links
Co-Links
Types of Hyperlinks
In-Links
In-links are the incoming
hyperlinks or links directed
toward a website or originated in
other websites.
Types of Hyperlinks
In-Links
In-links are of great interest to social
markers, because they bring traffic
to a particular website.
Quality of in-links
Number of in-links
Types of Hyperlinks
Out-Links
Out-links are hyperlinks generated
out of a website.
Types of Hyperlinks
Co-Links
if two websites receive a link from a
third website, they are considered to
be connected indirectly.
A comparative analysis
of out-links (embedded
in tweets) between
tweets of the Korean
and US governments by
Khan et al. (2014).
Figure 2. Twitter networks for Korea (left) and the US Governments (right)
Hyperlink Analytics Tools
The Problem
The Solutions
The Results
291
Copyright © 2015 Gohar F. Khan
Tutorial: Hyperlinks Analytics with
VOSON
293
Copyright © 2015 Gohar F. Khan
Copyright © 2018 Gohar F. Khan
Chapter 12:
Capturing Value
with Multimedia
Analytics
Label Detection
Explicit Content Detection
Logo Detection
Optical Character Recognition
Face Detection
Image Attributes
Web Detection
Brand Mentions
Sentiment Analysis
Measure Sponsorship ROI
Find Visual Influencers
Identify Moments of Consumption
Application Description
Recognition Video analytics is used for facial and object recognition (e.g., number plates) to recognize, and
therefore possibly identify persons or objects. For example, IBM video analytics tool allows users to
enroll facial images of objects and people of interest in a watch list and the system compares them with
faces captured by body cameras. High-quality matches are ranked for analyst review.
Detect changes in patterns From live-streaming fixed cameras, receive automatic alerts when movement of objects (people and
vehicles) is inconsistent with predefined patterns
Flame and smoke Internet-connected video surveillance cameras can be used to detect flame and smoke in 15–20 seconds
detection or even less because of the built-in smart microchip. The microchip processes are capable of analyzing
flame and smoke characteristics such as color chrominance, flickering ratio, shape, pattern and moving
direction.
Egomotion estimation Egomotion estimation is used to determine the location of a camera by analyzing its output signal.
Motion detection Motion detection is used to determine the presence of relevant motion in the observed scene.
Shape recognition Shape recognition is used to recognize shapes in the input video, for example circles or squares. This
functionality is typically used in more advanced functionalities such as object detection.
Style detection Style detection is used in settings where the video signal has been produced, for example, for television
broadcast. Style detection detects the style of the production process.
Tamper detection Tamper detection is used to determine whether the camera or output signal is tampered with.
Video tracking Video tracking is used to determine the location of persons or objects in the video signal, possibly with
regard to an external reference grid.
Video Stats Analytics
Stiff completion
Low Attention Span
Viral Sharing
Big Brother
Throttling and Network Congestion
ChannelMeter
Vidooly
Quintly
RankTrackr
Socialbakers
Ooyala
Objectives
◇ Understanding Social Risks: Legal, Privacy, and
Security Risk
◇ Securing your social media
◇ Social Media Risks Management: Identity,
access, prioritize, and mitigate
328
329
Security Risks
Ethical Risks
Technological Risks
Social Risks
Economical Risks
329
330
330
331
Intellectual Property
◇ Intellectual property rights (IPRs) are the rights
granted to the creators of IP
■ Copyright
■ Trademarks
■ Patents
◇ Materials could be ideas, documents, pictures, songs,
etc.
331
332
Trade Secrets
◇ Not generally known to public; it can be a formula, an algorithm, design,
recipe, process, method, etc.
◇ Can easily be leaked on social media, often mistakenly
◇ An employee leaking trade secrets through social media (or any
other medium) breaches:
• Duty of loyalty
• Breach of contractual confidentially
• Non-discloser agreement, etc.
332
333
Trade Secrets
◇ How to stop the breaching process?
■ Updated social media use policies: (e.g., what
employees should not discuss about the company’s
plans on social media).
■ Updated Non-discloser agreement: A legal
document prohibits material, knowledge, or information
exchange with third parties
■ Training of the employees
333
334
Trademark
◇ Confusing customer about the trademark
■ You maybe liable for creating a brand
handler/account similar to other brands that confuses
other social media users
■ There may also be liability for the use of other
trademarks in Google Adwords, Keywords, and in
metatags
334
335
Spam
◇ Sending unwanted messages/posts
■ A commercial message send without the
consent of the recipient is considered spam
◇ To comply:
■ Obtain consent
■ Provide opt-in and opt-out options
■ Keep your messages header honest
335
336
Privacy
◇ “the right to be let alone.” (Warren and
Brandeis, 1890)
◇ Privacy in context of social media is the “ability
to decide what information one discloses or
withholds about oneself on social media, who has
access to such information, and for what purposes
one’s information may or may not be used.”
336
337
338
339
Securing your social media
platforms
◇ Use strong passwords
◇ Multi-factor authentication
◇ Trusted contact
◇ Disable or revoke third-party apps
◇ Login notification
◇ Review your login history
339
340
Securing your social media
platforms
◇ Multi-factor authentication
■ Extra layer of security
■ If someone figures out a user password, they
will still not be able to access the account
unless they have physical access to a security
token such as 4 digit pin.
340
341
Securing your social media
platforms
◇ Disable or revoke third-party apps
■ Third-party apps are developed by other
companies but have access to Facebook via its
application programming interface (API).
■ May handle your account information
insecurely
341
342
Securing your social media
platforms
◇ Trusted contact:
■ The trusted contact is an account recovery
feature provided by Facebook to help you
access your account securely through your
friends if you have trouble accessing your
account.
■ Trusted contacts can send a code and URL
from Facebook to help you login
342
343
Securing your social media
platforms
◇ Login notification:
■ Enable your Facebook login notification so that you
can be notified through e-mail or text message when
your account is accessed.
◇ Review your login history:
■ It is a good practice regularly to review your account
login history and location
343
344
Social Media Risk Management
Framework
◇ Social media risks management loop consists of four
iterative steps:
■ Identify
■ Access
■ Mitigate, and
■ Evaluate
344
345
Social Media Risk Management
Framework
Evaluate Identify
Mitigate Assess
• Risk mitigation • Risk impact and
planning and probability
strategy assessment
• Risk mitigation • Risk prioritization
implementation
345
346
Risk Assessment
◇ The risk assessment process determines the
likelihood of a social media risk event that could impact
the organization :
■ Legally
■ Economically
■ Technically
■ Politically, and
■ Socially
346
347
Risk Assessment
◇ Risks are priorities and ranked based on probability
of occurrence and impact on an organization.
347
348
Risk Mitigation
◇ Typical risks mitigation strategies are:
■ Risks management governance
■ Training and awareness
■ Social media policy
■ Securing social media platform
348
349
349