Instagram Data Analysis Using Panoply and Mode - by Ka Hou Sio - Towards Data Science

2/5/23, 9:08 PM Instagram Data Analysis Using Panoply and Mode | by Ka Hou Sio | Towards Data Science
Published in Towards Data Science
You have 1 free member-only story left this month. Sign up for Medium and get an extra one
Ka Hou Sio Follow
May 28, 2019 · 9 min read · · Listen
Save
Instagram Data Analysis Using Panoply and

Mode
picture credits to panoply.io
Background
This project is built on top of the data challenge that Panoply has released in Apr 2019.
Panoply is a cloud data warehouse that you could gather data from different data
sources (i.e. AWS S3, Google analytics and etc.) easily into one place and then connect
to different Business Intelligence tools (i.e. Chartio, Mode and etc.) for analytics and
insights.
https://towardsdatascience.com/instagram-data-analysis-ce03aa4a472a 1/25
Panoply has recently integrated their data warehouse with Instagram API to collect
data. This challenge is about using Panoply as an ETL tool to explore Instagram data
for marketing use (i.e. promotion, segmentation and etc.).
In this challenge, challengers are asked to set up a Panoply account, and connect to
self Instagram or provided Instagram data to perform analyses, draw insights and
build visualization for storytelling. If you have an Instagram account, you can use the
data from your own Instagram account. Or if you prefer to use the data provided by
Panoply, you can choose from 2 accounts:
Kirsten Alana: A top Instagram Influencer.
Shinesty: An innovative and edgy apparel brand.
Then you can use any of your choice of BI tools for data visualization. The final
deliverable of this challenge will be in English, data visualization(s) for communicating
your findings and the SQL queries that you have used.
Project Design
Steps of my project would be:
1. Explore Panoply, using resources it provided for the challenge and documentations
on its own website.
2. Create Panoply free trial account, connect to provided data source, connect to BI
tool.
3. Explore the datasets, understand the schemas and relationships.
4. Research and come up with questions/metrics/KPIs that an Instagram owner

would like to track and have them answered. So they could get insights to grow
their brand.
5. Use Mode to build visualizations and submit to the challenge.
Tools
The tools and technologies that I am going to use in this project would be Panoply,
Instagram data, SQL and Mode.
Process
I started by reading the resources that Panoply provided and explore their website to
better understand what Panoply is and how it plays a role in data analytics process.
Then I created the free trial of Panoply account and follow their documentations to
connect to Amazon S3 to collect the provided Instagram data.
Create a Panoply account

Below is the flow for singing up an Panoply account:
1. Go to https://panoply.io.
panoply.io landing page screenshot
2. Sign up a free 21-day trail account with an email account.
sign-up page screenshot
Connect to data sources and collect data

Below is the flow for connecting to Amazon S3 and collect the Instagram data using
APIs. In this challenge, I used the data from Shinesty, an innovative and edgy apparel
brand as my analytic target.
1. Go to Data Sources tab.
Data Sources tab screenshot
2. Select the data source you want to connect to, for this project I used Amazon S3.
Input your credentials and start to collect data from your source to Panoply data
warehouse.
Open in app Sign up Sign In
Search Medium
Amazon S3 api info page screenshot
3. Your tables are ready to go.
Tables tab screenshot
Tables for analysis

After collection of data, I found out that there are 31 tables in total. After exploration
and examination, I identified the 5 most useful tables and I used them to draw
meaningful insights. Below are the tables and the columns that I have used.
312 1
shinestyinstagram_instagram_user table screenshot
shinestyinstagram_instagram_media table screenshot
shinestyinstagram_instagram_comment table screenshot

shinestyinstagram_instagram_media_tags table screenshot
shinestyinstagram_instagram_media_location table screenshot
Metrics
After my research and I identified some metrics and questions that an Instagram
account owner would like to know in order to improve their account awareness given
the data the API provided.
Engagement per each hashtag.
Performance over time.
Performance by day of week using when a user comment.
Performance by hour of day using when a user made a comment.
Performance by hour of day using when the post is created.
Performance by types of media.
Performance for different media by filter.
Performance by location tag used.
Performance difference for posts that has location tag versus there is no tag.
Top active commenters.
Cohort Analysis.
Connect Mode for visualization

1. Go to https://mode.com to create a free non-business account.
Mode home page screenshot
2. After you are in your account dashboard, hit the down-arrow on the left that is next
to your name.
home dashboard screenshot
3. Then a new drop down menu will come up and select the Connect a Database tab.
drop down menu for connecting a database screenshot
4. Then you can connect to different database using your credentials.
connect to database screenshot
For this project, we used Amazon Redshift because that is the database that Panoply
used to store our tables. After you entered and collected to your Panoply database,
Mode will start to collect all the tables to your account database in Mode. Once it is
done, you can start using SQL to analyze your data in Mode.
using SQL to analyze the data in Mode screenshot
Analysis
Below are the queries that I have used for answering the questions in the Metric
section.
Finding engagement for each hashtag used:
with t AS
(SELECT
value as hashtag,
likes_count as likes,
comments_count as comments
FROM public.shinestyinstagram_instagram_media m
left JOIN public.shinestyinstagram_instagram_media_tags mt
ON m.id = mt.instagram_media_id
)
select
hashtag,
AVG(likes) as avg_likes,
AVG(comments) as avg_comments
from
t
where hashtag is not null
group by 1
Below visual tells you the average likes for each hashtag, #housetonstrong and
#theperfectcrime have the most average likes performance.
Average Likes vs. Hashtag
Below visual tells you the average comments with respects to each hashtag used,
#buttstuff and #macrobrews have the best performance.
Average Comments vs. Hashtag
Below visual combine the above twos.
Performance By Hashtag
Find out performance over time:
SELECT
DATE_TRUNC(‘week’, created_time)::DATE as week,
SUM(comments_count) AS total_comments,
AVG(comments_count) AS avg_comments,
SUM(likes_count) AS total_likes,
AVG(likes_count) AS avg_likes,
count(distinct id) as nums_of_post
FROM
public.shinestyinstagram_instagram_media
GROUP BY
1
ORDER BY
1
Performance Over Time
Finding performance by day of week using when a user comment:
select
TO_CHAR(created_time, ‘DY’) as day,
COUNT(distinct media_id) AS nums_of_post_got_commented,
COUNT(distinct from_username) AS nums_of_commenter,
ROUND((nums_of_commenter/ cast(nums_of_post_got_commented as FLOAT)),
0) as average_commenter_per_post
from
public.shinestyinstagram_instagram_comments
group by
1
order by
1
Below visual shows that Thursday and Friday are the time when the users like to
comment the most.
Engagement Performance by Day of Week
Finding performance by hour of day using when a user made a comment:
select
TO_CHAR(created_time, ‘HH24’) as hour,
COUNT(distinct media_id) AS nums_of_post_got_commented,
COUNT(distinct from_username) AS nums_of_commenter,
ROUND((nums_of_commenter/ cast(nums_of_post_got_commented as FLOAT)),
0) as average_commenter_per_post
from
group by
1
order by
1
Engagement Performance by Hour
Finding performance by hour of day using when the post is created:
SELECT
TO_CHAR(created_time, ‘HH24’) as hour,
SUM(comments_count) AS total_comments,
AVG(comments_count) AS avg_comments,
SUM(likes_count) AS total_likes,
AVG(likes_count) AS avg_likes,
count(distinct id) as nums_of_post
FROM
GROUP BY
1
ORDER BY 1
The purpose of me showing this query is because I think it is not appropriate to use
this insight to make a recommendation to post a IG post during the 7am to 4pm of a
day. Because this query is using when a post is created to calculate the numbers of
likes and comments. In contrast, the last visual used when a user has commented to
calculate the performance, which I think would be more accurate. So I would like to
suggest @shinestythreads to post during 11pm to 2am of a day to get more comment
engagement. Unfortunately the API didn’t provide the same info for likes, which I
would like to use to calculate for like engagement.
Performance by Hour
Finding performance by types of media:
SELECT
type,
SUM(likes_count) as total_likes,
AVG(likes_count) as avg_likes,
SUM(comments_count) as total_comments,
AVG(comments_count) as avg_comments,
COUNT(distinct id) as nums_of_post
FROM
GROUP BY
1
There are three types of medias that IG offers right now: image, video and carousel.
Below visuals show the video media has the most average comments and the carousel
media has the most average likes. And the video media shows itself is the best media to
get engagement.
Performance by Media Type using average comments
Performance by Media Type using average likes
Finding performance for different media by filter:
SELECT
filter,
type,
AVG(comments_count) as avg_comments,
COUNT(distinct id) as nums_of_post
FROM
GROUP BY
1, 2
In below visual, we can see the filter named Crema has the best performance beside
the Normal filter for image media.
Performance by Filter for Image Media
In below visual, we can see the filter named Ashby has the best performance beside
the Normal filter for video media.
Performance by Filter for Video Media
For carousel media, we can do the same to get insight but I didn’t do it in here because
@shinestythreads only has normal filter for carousel media.
Find out performance by location tag used:
SELECT
location,
AVG(comments_count) as avg_comments
FROM
(SELECT
name as location,
m.likes_count,
m.comments_count
FROM
public.shinestyinstagram_instagram_media_location l
LEFT JOIN public.shinestyinstagram_instagram_media m
ON l.instagram_media_id = m.id
) as t
GROUP BY
1
This visual shows us that among all the posts that has a location tag, Augusts National
Golf Club has the most average likes and comments engagement.
Performance by Location Tag
Find out performance difference for posts that has location tag versus there is no
tag:
WITH t AS
(SELECT
m.id,
m.likes_count,
m.comments_count,
l.name as location
FROM
public.shinestyinstagram_instagram_media m
LEFT JOIN
public.shinestyinstagram_instagram_media_location l
ON
m.id = l.instagram_media_id
),
w as
(SELECT
*,
(CASE WHEN location IS NULL THEN 0 ELSE 1 END) AS have_location
FROM t
)
SELECT
have_location,
AVG(comments_count) as avg_comments
FROM
w
GROUP BY
1
Posts with location tag has more average likes but slightly less average comments
compare to posts that has no location tag.
Performance of Location Tag vs. No Location Tag
Find out top active commenters:
SELECT
*
FROM
(SELECT
from_username as username,
COUNT(media_id) as nums_of_comments,
RANK() OVER(ORDER BY nums_of_comments DESC)
FROM
GROUP BY
1
ORDER BY
2 DESC
) as t
WHERE
rank >1 and rank <=15
This visual shows us who are the top active commenters (users), not including the user
@shinestythreads.
Most Active Commenters
Cohort Analysis:
with t AS
(select
media_id,
from_username as username,
DATE_TRUNC('week', created_time) as week

from
),
w AS
(select
username,
min(week) as first_time_commenting
from
t
GROUP by
1
)
SELECT
x.cohort::DATE AS week,
MAX(x.week_number) OVER (PARTITION BY x.cohort) AS total_nums_of_week,
x.week_number,
MAX(x.nums_of_commenter) OVER (PARTITION BY x.cohort) AS
nums_of_new_commenter,
x.nums_of_commenter,
x.nums_of_commenter/MAX(x.nums_of_commenter) OVER (PARTITION BY
x.cohort)::FLOAT AS retention_rate
FROM
(SELECT
w.first_time_commenting as cohort,
FLOOR(EXTRACT('day' FROM t.week - w.first_time_commenting)/7) AS
week_number,
COUNT(DISTINCT t.username) AS nums_of_commenter
FROM
t t
LEFT JOIN
w w
ON
t.username = w.username
GROUP BY
1,2) as x
ORDER BY 1,2,3
Above are all the analyses I have done for the project. For the data I have used, I didn’t
upload them to my Github because of confidential purposes. If you are interested in
learning what resources I have used for this project, please visit my Github repo.
If you have any questions, feel free to comment below. Thank you so much for reading!
Panoply Instagram Etl Data Visualization Data Science
Sign up for The Variable

By Towards Data Science
Every Thursday, the Variable delivers the very best of Towards Data Science: from hands-on tutorials and cutting-edge
research to original features you don't want to miss. Take a look.
By signing up, you will create a Medium account if you don’t already have one. Review
our Privacy Policy for more information about our privacy practices.
Get this newsletter
About Help Terms Privacy
Get the Medium app

Instagram Data Analysis Using Panoply and Mode - by Ka Hou Sio - Towards Data Science

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Instagram Data Analysis Using Panoply and Mode - by Ka Hou Sio - Towards Data Science

Uploaded by

Copyright:

Available Formats

2/5/23, 9:08 PM Instagram Data Analysis Using Panoply and Mode | by Ka Hou Sio | Towards Data Science

Published in Towards Data Science

Ka Hou Sio Follow

May 28, 2019 · 9 min read · · Listen

Instagram Data Analysis Using Panoply and

picture credits to panoply.io

Kirsten Alana: A top Instagram Influencer.

Shinesty: An innovative and edgy apparel brand.

3. Explore the datasets, understand the schemas and relationships.

4. Research and come up with questions/metrics/KPIs that an Instagram owner

5. Use Mode to build visualizations and submit to the challenge.

Create a Panoply account

panoply.io landing page screenshot

2. Sign up a free 21-day trail account with an email account.

sign-up page screenshot

Connect to data sources and collect data

1. Go to Data Sources tab.

Data Sources tab screenshot

Open in app Sign up Sign In

Amazon S3 api info page screenshot

3. Your tables are ready to go.

Tables tab screenshot

Tables for analysis

shinestyinstagram_instagram_user table screenshot

shinestyinstagram_instagram_media table screenshot

shinestyinstagram_instagram_comment table screenshot

shinestyinstagram_instagram_media_tags table screenshot

shinestyinstagram_instagram_media_location table screenshot

Engagement per each hashtag.

Performance over time.

Performance by day of week using when a user comment.

Performance by hour of day using when a user made a comment.

Performance by hour of day using when the post is created.

Performance by types of media.

Performance for different media by filter.

Performance by location tag used.

Top active commenters.

Connect Mode for visualization

Mode home page screenshot

home dashboard screenshot

drop down menu for connecting a database screenshot

4. Then you can connect to different database using your credentials.

connect to database screenshot

using SQL to analyze the data in Mode screenshot

Finding engagement for each hashtag used:

Average Likes vs. Hashtag

Average Comments vs. Hashtag

Below visual combine the above twos.

Find out performance over time:

Performance Over Time

Finding performance by day of week using when a user comment:

Engagement Performance by Day of Week

Finding performance by hour of day using when a user made a comment:

Engagement Performance by Hour

Finding performance by hour of day using when the post is created:

Finding performance by types of media:

Performance by Media Type using average comments

Performance by Media Type using average likes

Finding performance for different media by filter:

Performance by Filter for Image Media

Performance by Filter for Video Media

Find out performance by location tag used: