You are on page 1of 1

Problem specification:

Social Media Analytics: you will use Spark and suite of relevant big data tools to
analyze social media data for gaining insights into user behavior, trends, and
sentiment. Data can be collected from various social media platforms, such as
Twitter, Facebook, and LinkedIn. You will analyze the collected data to find out
which topics are are of interest or popular, what people are talking about a
particular brand or product, and how users are engaging with the social media
content. The work will require you to follow different steps of data analytics,
e.g.

1. Data Collection
-- this is the first step
-- collect data from various sources
-- use APIs that are provided by the different providers
-- you can also use third-party tools to download historical mdata
-- the collected data should be stored in a spark-aware data format, e.g. JSON or
CSV
-- In the report, elaborate what you did and describe the data you collected

2. Data Preparation
-- Prepare the data for analysis.
-- The data preparation involves cleaning the data, removing duplicates, and
extracting relevant features, such as user names, post text, hashtags, and URLs.
-- You can use Spark's built-in data processing functions, such as filtering,
mapping, and aggregation, to clean and prepare the data.
-- Report what type of data preparation you did and what features you extracted

3. Data Analysis:
-- You can use Spark's machine learning libraries, such as MLlib, to analyze the
data.
-- For example, perform sentiment analysis to classify posts as positive, negative,
or neutral, or perform topic modeling to identify the most common topics discussed
on social media. If the task is to classify social media content as positive,
negative, or neutral, it can be approached using supervised learning. In this case,
labeled data is needed to train a model to recognize patterns in the text that
indicate positive or negative sentiment. Another example, topic modeling, where the
goal is to identify the underlying themes or topics in a set of social media posts,
can be approached using unsupervised learning.

--Elaborate the process in the report.

4. Data Visualization:
-- Finally, you can use Spark's data visualization libraries, such as SparkSQL, to
visualize the results of your analysis.
-- You can create graphs, charts, and maps to visualize the distribution of
sentiment, topic, or network characteristics to gain insights into user behavior,
trends, and sentiment [optional].
-- Report what type of visualization you worked on.

You might also like