You are on page 1of 13

UNIT – IV – MINING SOCIAL WEB

• Mining the social web refers to the process of extracting and analyzing data from social
media platforms and online communities.
• Data can be categorized into ,
1. Textual content: Public posts, comments, messages, and reviews.
2. Network information: Connections between users, groups, and communities.
3.Multimedia content: Images, videos, and audio files shared on the platform.
Overview of Mining Twitter
• Twitter, with its constant stream of public posts, offers a rich data source for
individuals and organizations seeking to understand public opinion, identify trends,
and gain valuable insights.
• This process of extracting and analyzing information from Twitter is known as
Twitter mining.
Why twitter?
• Public data: A significant portion of tweets are publicly available, providing a vast
amount of data for analysis.
• Real-time insights: With constant updates, Twitter offers a valuable resource for
understanding current events and trends.
• Specificity: Twitter allows filtering data based on hashtags, keywords, and user
mentions for targeted analysis.
• API access: Twitter provides a well-documented Application Programming Interface
(API) allowing developers to programmatically access and analyze tweets.
Mining of Twitter
• an interest graph as a way of modeling connections between people and
their arbitrary interests.
• Interest graphs provide a profound number of possibilities in the data mining
realm that primarily involve measuring correlations between things for the
objective of making intelligent recommendations and other applications in
machine learning.
• Data Collection: Utilize the Twitter API to gather tweets based on specific
criteria like keywords, hashtags, or user mentions.
• Data Preprocessing: Clean and prepare the collected data by removing
irrelevant information like URLs, emojis, and special characters. Techniques
like text normalization and tokenization are often used.
• Analysis: Apply various analytical techniques like sentiment analysis, topic
modeling, and network analysis to extract insights from the data.
Exploring Twitter’s API
• Twitter might be described as a real-time, highly social microblogging service
that allows users to post short status updates, called tweets, that appear on
timelines.
• Tweets may include one or more entities in their 140 characters of content
and reference one or more places that map to locations in the real world.
• An understanding of users, tweets, and timelines is particularly essential to
effective use of Twitter’s API.
• tweets come bundled with two additional pieces of metadata,they are
1. entities - are essentially the user mentions, hashtags, URLs, and media that
may be associated with a tweet
2. Places - are locations in the real world that may be attached to a tweet.
• timelines are the chronologically sorted collections of tweets.
• a timeline is any particular collection of tweets displayed in chronological
order;
• Twitter user, the home timeline is the view that you see when you log into
your account and look at all of the tweets from users that you are following,
whereas
• a particular user timeline is a collection of tweets only from a certain user.
• streams are samples of public tweets flowing through Twitter in realtime.
• The public firehose of all tweets has been known to peak at hundreds of
thousands of tweets per minute during events with particularly wide
interest, such as presidential debates.
• a small random sample of the public timeline is available that provides
filterable access to enough public data for API developers to develop
powerful applications
Creating a Twitter API connection
• Twitter uses simple RESTful – easy to use.
• Python package that wraps the Twitter API and mimics the public API semantics
almost one-to-one is twitter.
pip install twitter
• Before you can make any API requests to Twitter, you’ll need to create an application
at https://dev.twitter.com/apps.
• Creating an application is the standard way for developers to gain API access and for
Twitter to monitor and interact with third-party plat form developers as needed.
• The process for creating an application is pretty standard, and all that’s needed is
read-only access to the API.
• Oauth-open Authorization is a means of allowing users to authorize third-party
applications to access their account data without needing to share sensitive
information like a password
• the key pieces of information to be considered while developing an
APPlication
1. consumer key,
2.consumer secret,
3.access token, and
4. access token secret
• Twitter imposes rate limits on how many requests an application can
make to any given API resource within a given time window.
Analyzing the 140 Characters
• The online documentation is always the definitive source for Twitter
platform objects.
• Let assume that we have extracted a single tweet from the search results
and stored it in a variable named t.
• For example, t.keys() returns the top-level fields for the tweet and t['id']
accesses the identifier of the tweet.

You might also like