Professional Documents
Culture Documents
1. Problem Statment
This project focuses on analyzing user behavior on Instagram to gain insights into user
engagement, preferences, and overall experience. Key areas include user engagement metrics,
content analysis, demographics, follower growth/churn, influencer analysis,
brand advocacy, story engagement, page factors, and user journey analysis. Privacy and ethical
considerations align with Instagram's terms of service and data usage policies.
2. Importing Libraries
import numpy as np
import pandas as pd
import seaborn as sns
import plotly.express as px
import matplotlib.pyplot as plt
from matplotlib import style
%matplotlib inline
import warnings
warnings.filterwarnings('ignore')
/opt/conda/lib/python3.10/site-packages/scipy/ init .py:146: UserWarning: A NumPy version >=1.16.5 and <1.23.0 is requ
warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
3. Data Importing
df = pd.read_csv('/kaggle/input/instagram-fake-and-real-accounts-dataset/final-v1.csv')
df
edge_followed_by edge_follow username_length username_has_number full_name_has_number full_name_length is_priva
0 0.001 0.257 13 1 1 13
1 0.000 0.958 9 1 0 0
2 0.000 0.253 12 0 0 0
3 0.000 0.977 10 1 0 0
4 0.000 0.321 11 0 0 11
df.head(5)
edge_followed_by edge_follow username_length username_has_number full_name_has_number full_name_length is_private
0 0.001 0.257 13 1 1 13 0
1 0.000 0.958 9 1 0 0 0
2 0.000 0.253 12 0 0 0 0
3 0.000 0.977 10 1 0 0 0
4 0.000 0.321 11 0 0 11 1
df.tail(5)
df.shape
(785, 13)
df.columns
df.nunique()
edge_followed_by 22
edge_follow 506
username_length 21
username_has_number 2
full_name_has_number 2
full_name_length 30
is_private 2
is_joined_recently 2
has_channel 1
is_business_account 2
has_guides 2
has_external_url 2
is_fake 2
dtype: int64
df.corr()
edge_followed_by 0
edge_follow 0
username_length 0
username_has_number 0
full_name_has_number 0
full_name_length 0
is_private 0
is_joined_recently 0
has_channel 0
is_business_account 0
has_guides 0
has_external_url 0
is_fake 0
dtype: int64
#Statistical summary
df.describe().T
df.drop(["has_guides"],axis=1,inplace=True)
df.drop(["edge_follow"],axis=1,inplace=True)
df.drop(["has_channel"],axis=1,inplace=True)
df.drop(["edge_followed_by"],axis=1,inplace=True)
df.head(5)
0 13 1 1 13 0 0
1 9 1 0 0 0 1
2 12 0 0 0 0 0
3 10 1 0 0 0 0
4 11 0 0 11 1 0
5. Data Visualization
Data Visualization
account = df.groupby("is_business_account")
account = account.size()
account
is_business_account
0 727
1 58
dtype: int64
is_private
0 640
1 145
dtype: int64
fake_account_counts = df['is_fake'].value_counts()
labels = ['Yes', 'No']
colors = ['Skyblue', 'lightgreen']
explode = (0.1, 0) # Create a "donut hole" effect by exploding the first slice
plt.show()
def barplot(column, horizontal): plt.figure(figsize=(4, 4))
sns.countplot(x=column, data=df, palette='viridis') plt.xlabel(column)
plt.ylabel("Fake")
plt.title(f"Users have Business Account", fontweight='bold') plt.xticks(rotation=45)
sns.despine() plt.tight_layout() plt.show()
barplot('is_business_account', True)
barplot('is_private', True)
def barplot(column, horizontal):
plt.figure(figsize=(4, 4))
sns.countplot(x=column, data=df, palette='viridis')
plt.xlabel(column)
plt.ylabel("Fake")
plt.title(f"User name has number", fontweight='bold')
plt.xticks(rotation=45)
sns.despine()
plt.tight_layout()
plt.show()
barplot('username_has_number', True)
barplot('full_name_has_number', True)
def barplot(column, horizontal):
plt.figure(figsize=(4, 4))
sns.countplot(x=column, data=df, palette='viridis')
plt.xlabel(column)
plt.ylabel("Fake")
plt.title(f"Users are Joined Recently", fontweight='bold')
plt.xticks(rotation=45)
sns.despine()
plt.tight_layout()
plt.show()
barplot('is_joined_recently', True)
barplot('username_length', True)
6.Insights
1. Business Account: Users with non-business accounts tend to have a higher ratio of fake accounts.
2. Private/Public Accounts: Users with public accounts have a higher ratio of fake accounts.
3. Username Has Number: Users with names containing numbers have a slightly higher ratio of fake accounts.
4. Full Name Characteristics: Users with names containing numbers have a slightly higher ratio of fake accounts.
5. Recently Joined: Users who have not recently joined have higher ratio of fake accounts.
6. Username Length: Users with shorter usernames tend to have a higher ratio of fake accounts.
7.Conclusion
Conclusion
This Instagram user behavior analysis project provides valuable insights into account authenticity, user characteristics, and their impact on
fake/real accounts. The project helps identify trends and patterns that can assist in understanding user behavior on the platform and contribute to enhancing user
experience and privacy.