You are on page 1of 25

2/5/23, 9:08 PM Instagram Data Analysis Using Panoply and Mode | by Ka Hou Sio | Towards Data Science

Published in Towards Data Science

You have 1 free member-only story left this month. Sign up for Medium and get an extra one

Ka Hou Sio Follow

May 28, 2019 · 9 min read · · Listen

Save

Instagram Data Analysis Using Panoply and


Mode

picture credits to panoply.io

Background
This project is built on top of the data challenge that Panoply has released in Apr 2019.
Panoply is a cloud data warehouse that you could gather data from different data
sources (i.e. AWS S3, Google analytics and etc.) easily into one place and then connect
to different Business Intelligence tools (i.e. Chartio, Mode and etc.) for analytics and
insights.

https://towardsdatascience.com/instagram-data-analysis-ce03aa4a472a 1/25
2/5/23, 9:08 PM Instagram Data Analysis Using Panoply and Mode | by Ka Hou Sio | Towards Data Science

Panoply has recently integrated their data warehouse with Instagram API to collect
data. This challenge is about using Panoply as an ETL tool to explore Instagram data
for marketing use (i.e. promotion, segmentation and etc.).

In this challenge, challengers are asked to set up a Panoply account, and connect to
self Instagram or provided Instagram data to perform analyses, draw insights and
build visualization for storytelling. If you have an Instagram account, you can use the
data from your own Instagram account. Or if you prefer to use the data provided by
Panoply, you can choose from 2 accounts:

Kirsten Alana: A top Instagram Influencer.

Shinesty: An innovative and edgy apparel brand.

Then you can use any of your choice of BI tools for data visualization. The final
deliverable of this challenge will be in English, data visualization(s) for communicating
your findings and the SQL queries that you have used.

Project Design
Steps of my project would be:

1. Explore Panoply, using resources it provided for the challenge and documentations
on its own website.

2. Create Panoply free trial account, connect to provided data source, connect to BI
tool.

3. Explore the datasets, understand the schemas and relationships.

4. Research and come up with questions/metrics/KPIs that an Instagram owner


would like to track and have them answered. So they could get insights to grow
their brand.

5. Use Mode to build visualizations and submit to the challenge.

Tools
The tools and technologies that I am going to use in this project would be Panoply,
Instagram data, SQL and Mode.
https://towardsdatascience.com/instagram-data-analysis-ce03aa4a472a 2/25
2/5/23, 9:08 PM Instagram Data Analysis Using Panoply and Mode | by Ka Hou Sio | Towards Data Science

Process
I started by reading the resources that Panoply provided and explore their website to
better understand what Panoply is and how it plays a role in data analytics process.
Then I created the free trial of Panoply account and follow their documentations to
connect to Amazon S3 to collect the provided Instagram data.

Create a Panoply account


Below is the flow for singing up an Panoply account:

1. Go to https://panoply.io.

panoply.io landing page screenshot

2. Sign up a free 21-day trail account with an email account.

https://towardsdatascience.com/instagram-data-analysis-ce03aa4a472a 3/25
2/5/23, 9:08 PM Instagram Data Analysis Using Panoply and Mode | by Ka Hou Sio | Towards Data Science

sign-up page screenshot

Connect to data sources and collect data


Below is the flow for connecting to Amazon S3 and collect the Instagram data using
APIs. In this challenge, I used the data from Shinesty, an innovative and edgy apparel
brand as my analytic target.

1. Go to Data Sources tab.

Data Sources tab screenshot

https://towardsdatascience.com/instagram-data-analysis-ce03aa4a472a 4/25
2/5/23, 9:08 PM Instagram Data Analysis Using Panoply and Mode | by Ka Hou Sio | Towards Data Science

2. Select the data source you want to connect to, for this project I used Amazon S3.

Input your credentials and start to collect data from your source to Panoply data
warehouse.

Open in app Sign up Sign In

Search Medium

Amazon S3 api info page screenshot

3. Your tables are ready to go.

Tables tab screenshot

Tables for analysis


After collection of data, I found out that there are 31 tables in total. After exploration
and examination, I identified the 5 most useful tables and I used them to draw
https://towardsdatascience.com/instagram-data-analysis-ce03aa4a472a 5/25
2/5/23, 9:08 PM Instagram Data Analysis Using Panoply and Mode | by Ka Hou Sio | Towards Data Science

meaningful insights. Below are the tables and the columns that I have used.

312 1

shinestyinstagram_instagram_user table screenshot

shinestyinstagram_instagram_media table screenshot

shinestyinstagram_instagram_comment table screenshot


https://towardsdatascience.com/instagram-data-analysis-ce03aa4a472a 6/25
2/5/23, 9:08 PM Instagram Data Analysis Using Panoply and Mode | by Ka Hou Sio | Towards Data Science

shinestyinstagram_instagram_media_tags table screenshot

shinestyinstagram_instagram_media_location table screenshot

Metrics
After my research and I identified some metrics and questions that an Instagram
account owner would like to know in order to improve their account awareness given
the data the API provided.

Engagement per each hashtag.

Performance over time.

Performance by day of week using when a user comment.

Performance by hour of day using when a user made a comment.

Performance by hour of day using when the post is created.

Performance by types of media.

Performance for different media by filter.

Performance by location tag used.

Performance difference for posts that has location tag versus there is no tag.

Top active commenters.

Cohort Analysis.
https://towardsdatascience.com/instagram-data-analysis-ce03aa4a472a 7/25
2/5/23, 9:08 PM Instagram Data Analysis Using Panoply and Mode | by Ka Hou Sio | Towards Data Science

Connect Mode for visualization


1. Go to https://mode.com to create a free non-business account.

Mode home page screenshot

2. After you are in your account dashboard, hit the down-arrow on the left that is next
to your name.

home dashboard screenshot

3. Then a new drop down menu will come up and select the Connect a Database tab.

https://towardsdatascience.com/instagram-data-analysis-ce03aa4a472a 8/25
2/5/23, 9:08 PM Instagram Data Analysis Using Panoply and Mode | by Ka Hou Sio | Towards Data Science

drop down menu for connecting a database screenshot

4. Then you can connect to different database using your credentials.

connect to database screenshot

For this project, we used Amazon Redshift because that is the database that Panoply
used to store our tables. After you entered and collected to your Panoply database,
Mode will start to collect all the tables to your account database in Mode. Once it is
done, you can start using SQL to analyze your data in Mode.

https://towardsdatascience.com/instagram-data-analysis-ce03aa4a472a 9/25
2/5/23, 9:08 PM Instagram Data Analysis Using Panoply and Mode | by Ka Hou Sio | Towards Data Science

using SQL to analyze the data in Mode screenshot

Analysis
Below are the queries that I have used for answering the questions in the Metric
section.

Finding engagement for each hashtag used:

with t AS
(SELECT
value as hashtag,
likes_count as likes,
comments_count as comments
FROM public.shinestyinstagram_instagram_media m
left JOIN public.shinestyinstagram_instagram_media_tags mt
ON m.id = mt.instagram_media_id
)
select
hashtag,
AVG(likes) as avg_likes,
AVG(comments) as avg_comments
from
t
where hashtag is not null
group by 1

https://towardsdatascience.com/instagram-data-analysis-ce03aa4a472a 10/25
2/5/23, 9:08 PM Instagram Data Analysis Using Panoply and Mode | by Ka Hou Sio | Towards Data Science

Below visual tells you the average likes for each hashtag, #housetonstrong and
#theperfectcrime have the most average likes performance.

Average Likes vs. Hashtag

Below visual tells you the average comments with respects to each hashtag used,
#buttstuff and #macrobrews have the best performance.

Average Comments vs. Hashtag

Below visual combine the above twos.

https://towardsdatascience.com/instagram-data-analysis-ce03aa4a472a 11/25
2/5/23, 9:08 PM Instagram Data Analysis Using Panoply and Mode | by Ka Hou Sio | Towards Data Science

Performance By Hashtag

Find out performance over time:

SELECT
DATE_TRUNC(‘week’, created_time)::DATE as week,
SUM(comments_count) AS total_comments,
AVG(comments_count) AS avg_comments,
SUM(likes_count) AS total_likes,
AVG(likes_count) AS avg_likes,
count(distinct id) as nums_of_post
FROM
public.shinestyinstagram_instagram_media
GROUP BY
1
ORDER BY
1

https://towardsdatascience.com/instagram-data-analysis-ce03aa4a472a 12/25
2/5/23, 9:08 PM Instagram Data Analysis Using Panoply and Mode | by Ka Hou Sio | Towards Data Science

Performance Over Time

Finding performance by day of week using when a user comment:

select
TO_CHAR(created_time, ‘DY’) as day,
COUNT(distinct media_id) AS nums_of_post_got_commented,
COUNT(distinct from_username) AS nums_of_commenter,
ROUND((nums_of_commenter/ cast(nums_of_post_got_commented as FLOAT)),
0) as average_commenter_per_post
from
public.shinestyinstagram_instagram_comments
group by
1
order by
1

Below visual shows that Thursday and Friday are the time when the users like to
comment the most.

https://towardsdatascience.com/instagram-data-analysis-ce03aa4a472a 13/25
2/5/23, 9:08 PM Instagram Data Analysis Using Panoply and Mode | by Ka Hou Sio | Towards Data Science

Engagement Performance by Day of Week

Finding performance by hour of day using when a user made a comment:

select
TO_CHAR(created_time, ‘HH24’) as hour,
COUNT(distinct media_id) AS nums_of_post_got_commented,
COUNT(distinct from_username) AS nums_of_commenter,
ROUND((nums_of_commenter/ cast(nums_of_post_got_commented as FLOAT)),
0) as average_commenter_per_post
from
public.shinestyinstagram_instagram_comments
group by
1
order by
1

https://towardsdatascience.com/instagram-data-analysis-ce03aa4a472a 14/25
2/5/23, 9:08 PM Instagram Data Analysis Using Panoply and Mode | by Ka Hou Sio | Towards Data Science

Engagement Performance by Hour

Finding performance by hour of day using when the post is created:

SELECT
TO_CHAR(created_time, ‘HH24’) as hour,
SUM(comments_count) AS total_comments,
AVG(comments_count) AS avg_comments,
SUM(likes_count) AS total_likes,
AVG(likes_count) AS avg_likes,
count(distinct id) as nums_of_post
FROM
public.shinestyinstagram_instagram_media
GROUP BY
1
ORDER BY 1

The purpose of me showing this query is because I think it is not appropriate to use
this insight to make a recommendation to post a IG post during the 7am to 4pm of a
day. Because this query is using when a post is created to calculate the numbers of
likes and comments. In contrast, the last visual used when a user has commented to
calculate the performance, which I think would be more accurate. So I would like to
suggest @shinestythreads to post during 11pm to 2am of a day to get more comment
engagement. Unfortunately the API didn’t provide the same info for likes, which I
would like to use to calculate for like engagement.

https://towardsdatascience.com/instagram-data-analysis-ce03aa4a472a 15/25
2/5/23, 9:08 PM Instagram Data Analysis Using Panoply and Mode | by Ka Hou Sio | Towards Data Science

Performance by Hour

Finding performance by types of media:

SELECT
type,
SUM(likes_count) as total_likes,
AVG(likes_count) as avg_likes,
SUM(comments_count) as total_comments,
AVG(comments_count) as avg_comments,
COUNT(distinct id) as nums_of_post
FROM
public.shinestyinstagram_instagram_media
GROUP BY
1

There are three types of medias that IG offers right now: image, video and carousel.
Below visuals show the video media has the most average comments and the carousel
media has the most average likes. And the video media shows itself is the best media to
get engagement.

https://towardsdatascience.com/instagram-data-analysis-ce03aa4a472a 16/25
2/5/23, 9:08 PM Instagram Data Analysis Using Panoply and Mode | by Ka Hou Sio | Towards Data Science

Performance by Media Type using average comments

Performance by Media Type using average likes

Finding performance for different media by filter:

SELECT
filter,
type,
SUM(likes_count) as total_likes,
AVG(likes_count) as avg_likes,
SUM(comments_count) as total_comments,
AVG(comments_count) as avg_comments,
COUNT(distinct id) as nums_of_post

https://towardsdatascience.com/instagram-data-analysis-ce03aa4a472a 17/25
2/5/23, 9:08 PM Instagram Data Analysis Using Panoply and Mode | by Ka Hou Sio | Towards Data Science

FROM
public.shinestyinstagram_instagram_media
GROUP BY
1, 2

In below visual, we can see the filter named Crema has the best performance beside
the Normal filter for image media.

Performance by Filter for Image Media

In below visual, we can see the filter named Ashby has the best performance beside
the Normal filter for video media.

https://towardsdatascience.com/instagram-data-analysis-ce03aa4a472a 18/25
2/5/23, 9:08 PM Instagram Data Analysis Using Panoply and Mode | by Ka Hou Sio | Towards Data Science

Performance by Filter for Video Media

For carousel media, we can do the same to get insight but I didn’t do it in here because
@shinestythreads only has normal filter for carousel media.

Find out performance by location tag used:

SELECT
location,
SUM(likes_count) as total_likes,
AVG(likes_count) as avg_likes,
SUM(comments_count) as total_comments,
AVG(comments_count) as avg_comments
FROM
(SELECT
name as location,
m.likes_count,
m.comments_count
FROM
public.shinestyinstagram_instagram_media_location l
LEFT JOIN public.shinestyinstagram_instagram_media m
ON l.instagram_media_id = m.id
) as t
GROUP BY
1

This visual shows us that among all the posts that has a location tag, Augusts National
Golf Club has the most average likes and comments engagement.

https://towardsdatascience.com/instagram-data-analysis-ce03aa4a472a 19/25
2/5/23, 9:08 PM Instagram Data Analysis Using Panoply and Mode | by Ka Hou Sio | Towards Data Science

Performance by Location Tag

Find out performance difference for posts that has location tag versus there is no
tag:

WITH t AS
(SELECT
m.id,
m.likes_count,
m.comments_count,
l.name as location
FROM
public.shinestyinstagram_instagram_media m
LEFT JOIN
public.shinestyinstagram_instagram_media_location l
ON
m.id = l.instagram_media_id
),
w as
(SELECT
*,
(CASE WHEN location IS NULL THEN 0 ELSE 1 END) AS have_location
FROM t
)
SELECT
have_location,
SUM(likes_count) as total_likes,
AVG(likes_count) as avg_likes,
SUM(comments_count) as total_comments,
AVG(comments_count) as avg_comments
FROM
w
GROUP BY
1

Posts with location tag has more average likes but slightly less average comments
compare to posts that has no location tag.

https://towardsdatascience.com/instagram-data-analysis-ce03aa4a472a 20/25
2/5/23, 9:08 PM Instagram Data Analysis Using Panoply and Mode | by Ka Hou Sio | Towards Data Science

Performance of Location Tag vs. No Location Tag

Find out top active commenters:

SELECT
*
FROM
(SELECT
from_username as username,
COUNT(media_id) as nums_of_comments,
RANK() OVER(ORDER BY nums_of_comments DESC)
FROM
public.shinestyinstagram_instagram_comments
GROUP BY
1
ORDER BY
2 DESC
) as t
WHERE
rank >1 and rank <=15

This visual shows us who are the top active commenters (users), not including the user
@shinestythreads.

https://towardsdatascience.com/instagram-data-analysis-ce03aa4a472a 21/25
2/5/23, 9:08 PM Instagram Data Analysis Using Panoply and Mode | by Ka Hou Sio | Towards Data Science

Most Active Commenters

Cohort Analysis:

with t AS
(select
media_id,
from_username as username,

DATE_TRUNC('week', created_time) as week


from
public.shinestyinstagram_instagram_comments
),
w AS

(select
username,
min(week) as first_time_commenting
from
t

GROUP by
1
)
https://towardsdatascience.com/instagram-data-analysis-ce03aa4a472a 22/25
2/5/23, 9:08 PM Instagram Data Analysis Using Panoply and Mode | by Ka Hou Sio | Towards Data Science

SELECT
x.cohort::DATE AS week,
MAX(x.week_number) OVER (PARTITION BY x.cohort) AS total_nums_of_week,
x.week_number,
MAX(x.nums_of_commenter) OVER (PARTITION BY x.cohort) AS
nums_of_new_commenter,
x.nums_of_commenter,
x.nums_of_commenter/MAX(x.nums_of_commenter) OVER (PARTITION BY
x.cohort)::FLOAT AS retention_rate
FROM

(SELECT
w.first_time_commenting as cohort,
FLOOR(EXTRACT('day' FROM t.week - w.first_time_commenting)/7) AS
week_number,
COUNT(DISTINCT t.username) AS nums_of_commenter
FROM
t t
LEFT JOIN
w w
ON

t.username = w.username
GROUP BY
1,2) as x
ORDER BY 1,2,3

https://towardsdatascience.com/instagram-data-analysis-ce03aa4a472a 23/25
2/5/23, 9:08 PM Instagram Data Analysis Using Panoply and Mode | by Ka Hou Sio | Towards Data Science

Above are all the analyses I have done for the project. For the data I have used, I didn’t
upload them to my Github because of confidential purposes. If you are interested in
learning what resources I have used for this project, please visit my Github repo.

If you have any questions, feel free to comment below. Thank you so much for reading!

Panoply Instagram Etl Data Visualization Data Science

Sign up for The Variable


https://towardsdatascience.com/instagram-data-analysis-ce03aa4a472a 24/25
2/5/23, 9:08 PM Instagram Data Analysis Using Panoply and Mode | by Ka Hou Sio | Towards Data Science

By Towards Data Science

Every Thursday, the Variable delivers the very best of Towards Data Science: from hands-on tutorials and cutting-edge
research to original features you don't want to miss. Take a look.

By signing up, you will create a Medium account if you don’t already have one. Review
our Privacy Policy for more information about our privacy practices.

Get this newsletter

About Help Terms Privacy

Get the Medium app

https://towardsdatascience.com/instagram-data-analysis-ce03aa4a472a 25/25

You might also like