You are on page 1of 10

10/9/21, 5:43 PM Top Streamers on Twitch

Introduction
Fardosa Mohamed Salat 657967

Logistic regression for the data set of Twitch streamers

In [1]:
import pandas as pd

In [2]:
import numpy as np

In [3]:
np.random.seed(7)

In [4]:
data=pd.read_csv('twitchdata-update.csv')

In [5]:
data.head()

Out[5]: Watch Stream Peak Average Followers Views


Channel Followers Par
time(Minutes) time(minutes) viewers viewers gained gained

0 xQcOW 6196161750 215250 222720 27716 3246298 1734810 93036735

1 summit1g 6091677300 211845 310998 25610 5310163 1370184 89705964

2 Gaules 5644590915 515280 387315 10976 1767635 1023779 102611607

3 ESL_CSGO 3970318140 517740 300575 7714 3944850 703986 106546942

4 Tfue 3671000070 123660 285644 29602 8938903 2068424 78998587

In [6]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 11 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Channel 1000 non-null object
1 Watch time(Minutes) 1000 non-null int64
2 Stream time(minutes) 1000 non-null int64
3 Peak viewers 1000 non-null int64
4 Average viewers 1000 non-null int64
5 Followers 1000 non-null int64
6 Followers gained 1000 non-null int64
7 Views gained 1000 non-null int64
8 Partnered 1000 non-null bool
9 Mature 1000 non-null bool
10 Language 1000 non-null object
dtypes: bool(2), int64(7), object(2)
memory usage: 72.4+ KB

In [7]:
data.isnull().sum()

Out[7]: Channel 0
Watch time(Minutes) 0
localhost:8888/nbconvert/html/OneDrive/Desktop/APT3025/Top Streamers on Twitch.ipynb?download=false 1/10
10/9/21, 5:43 PM Top Streamers on Twitch

Stream time(minutes) 0
Peak viewers 0
Average viewers 0
Followers 0
Followers gained 0
Views gained 0
Partnered 0
Mature 0
Language 0
dtype: int64

In [8]:
data.dtypes

Out[8]: Channel object


Watch time(Minutes) int64
Stream time(minutes) int64
Peak viewers int64
Average viewers int64
Followers int64
Followers gained int64
Views gained int64
Partnered bool
Mature bool
Language object
dtype: object

In [9]:
data.describe()

Out[9]: Watch Stream Average Followers


Peak viewers Followers V
time(Minutes) time(minutes) viewers gained

count 1.000000e+03 1000.000000 1000.000000 1000.000000 1.000000e+03 1.000000e+03 1

mean 4.184279e+08 120515.160000 37065.051000 4781.040000 5.700541e+05 2.055185e+05 1

std 5.496355e+08 85376.201364 60314.307686 8453.684965 8.044134e+05 3.399137e+05 2

min 1.221928e+08 3465.000000 496.000000 235.000000 3.660000e+03 -1.577200e+04 1

25% 1.631899e+08 73758.750000 9113.750000 1457.750000 1.705462e+05 4.375825e+04 3

50% 2.349908e+08 108240.000000 16676.000000 2425.000000 3.180630e+05 9.835200e+04 6

75% 4.337399e+08 141843.750000 37569.750000 4786.250000 6.243322e+05 2.361308e+05 1

max 6.196162e+09 521445.000000 639375.000000 147643.000000 8.938903e+06 3.966525e+06 6

In [10]:
data[data["Followers gained"] == data["Followers gained"].min()]

Out[10]: Watch Stream Peak Average Followers View


Channel Followers
time(Minutes) time(minutes) viewers viewers gained gaine

656 TSM_TheOddOne 181908120 188445 4363 913 864087 -15772 637094

In [11]:
data.columns

Out[11]: Index(['Channel', 'Watch time(Minutes)', 'Stream time(minutes)',


'Peak viewers', 'Average viewers', 'Followers', 'Followers gained',

localhost:8888/nbconvert/html/OneDrive/Desktop/APT3025/Top Streamers on Twitch.ipynb?download=false 2/10


10/9/21, 5:43 PM Top Streamers on Twitch

'Views gained', 'Partnered', 'Mature', 'Language'],


dtype='object')

In [12]:
import matplotlib.pyplot as plt
import seaborn as sns

In [13]:
fig = plt.figure(figsize=(10,6))
sns.countplot(x="Mature", data=data)
plt.title("Mature (+18) streamers")
plt.show()

In [14]:
languages_values = data["Language"].value_counts()
languages_values

Out[14]: English 485


Korean 77
Russian 74
Spanish 68
French 66
Portuguese 61
German 49
Chinese 30
Turkish 22
Italian 17
Polish 12
Thai 11
Japanese 10
Czech 6
Arabic 5
Hungarian 2
Slovak 1
Swedish 1
Greek 1
Finnish 1
Other 1
Name: Language, dtype: int64

In [15]:
localhost:8888/nbconvert/html/OneDrive/Desktop/APT3025/Top Streamers on Twitch.ipynb?download=false 3/10
10/9/21, 5:43 PM Top Streamers on Twitch

languages = data["Language"].unique()
languages

Out[15]: array(['English', 'Portuguese', 'Spanish', 'German', 'Korean', 'French',


'Russian', 'Japanese', 'Chinese', 'Czech', 'Turkish', 'Italian',
'Polish', 'Thai', 'Arabic', 'Slovak', 'Other', 'Hungarian',
'Greek', 'Finnish', 'Swedish'], dtype=object)

In [16]:
fig = plt.figure(figsize=(10, 6))
languages_sns = sns.barplot(x=languages[:10], y=languages_values[:10])
languages_sns.set(xlabel="Languages", ylabel="Streamers languages")
plt.title("Twitch Top 10 Languages")
plt.xticks(rotation=45)
plt.show()

In [17]:
adult = pd.get_dummies(data['Mature'], drop_first=True)

In [18]:
data['Adult'] = adult

In [19]:
data.head()

Out[19]: Watch Stream Peak Average Followers Views


Channel Followers Par
time(Minutes) time(minutes) viewers viewers gained gained

0 xQcOW 6196161750 215250 222720 27716 3246298 1734810 93036735

1 summit1g 6091677300 211845 310998 25610 5310163 1370184 89705964

2 Gaules 5644590915 515280 387315 10976 1767635 1023779 102611607

3 ESL_CSGO 3970318140 517740 300575 7714 3944850 703986 106546942

4 Tfue 3671000070 123660 285644 29602 8938903 2068424 78998587

localhost:8888/nbconvert/html/OneDrive/Desktop/APT3025/Top Streamers on Twitch.ipynb?download=false 4/10


10/9/21, 5:43 PM Top Streamers on Twitch

In [20]:
data.drop(['Channel','Partnered','Mature','Language'], axis=1, inplace=True)

In [21]:
data.head()

Out[21]: Watch Stream Peak Average Followers Views


Followers Adult
time(Minutes) time(minutes) viewers viewers gained gained

0 6196161750 215250 222720 27716 3246298 1734810 93036735 0

1 6091677300 211845 310998 25610 5310163 1370184 89705964 0

2 5644590915 515280 387315 10976 1767635 1023779 102611607 1

3 3970318140 517740 300575 7714 3944850 703986 106546942 0

4 3671000070 123660 285644 29602 8938903 2068424 78998587 0

In [22]:
X = data[['Watch time(Minutes)','Stream time(minutes)','Peak viewers','Followers','F
y = data['Adult']

In [23]:
y

Out[23]: 0 0
1 0
2 1
3 0
4 0
..
995 0
996 0
997 0
998 0
999 0
Name: Adult, Length: 1000, dtype: uint8

In [24]:
from sklearn.model_selection import train_test_split

In [25]:
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.2, random_state

In [26]:
from sklearn.metrics import accuracy_score, precision_recall_fscore_support, classif
from sklearn.linear_model import LogisticRegression
import wandb
import time

In [27]:
def train_eval_pipeline(model, train_data, test_data, name):
#initialize wandb
wandb.init(project="Twitch Streamers", name=name)
#assign the data
(X_train, y_train) = train_data
(X_test, y_test) = test_data

# Train the Model


start =time.time()
model.fit(X_train, y_train)
localhost:8888/nbconvert/html/OneDrive/Desktop/APT3025/Top Streamers on Twitch.ipynb?download=false 5/10
10/9/21, 5:43 PM Top Streamers on Twitch

end =time.time()-start
prediction = model.predict(X_test)

wandb.log({"accuracy":accuracy_score(y_test, prediction)*100, "precision":precis


print("Accuracy score of the logistic regression classifier with default hyperpa
print("\n")
print("---Classification report of the Logistic Regression Classifier with defau
print("\n")
print(classification_report(y_test, prediction, target_names=["Adults", "Non Adu

In [28]:
logreg=LogisticRegression()

In [30]:
train_eval_pipeline(logreg, (X_train, y_train), (X_test, y_test), "Logistic_Regressi

Finishing last run (ID:3g3mnlhm) before initializing another...

Waiting for W&B process to finish, PID 25228


Program ended successfully.

Find user logs for this run at: C:\Users\fardo\OneDrive\Desktop\APT3025\wandb\run-


20211009_172502-3g3mnlhm\logs\debug.log
Find internal logs for this run at: C:\Users\fardo\OneDrive\Desktop\APT3025\wandb\run-
20211009_172502-3g3mnlhm\logs\debug-internal.log

Run summary:

accuracy 78.0

precision 0.39

recall 0.5

training_time 0.07776

Run history:

accuracy ▁

precision ▁

recall ▁

training_time ▁

Synced 6 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)

Synced Logistic_Regression_Twitch_Streamers:
https://wandb.ai/fardosa1904/Twitch%20Streamers/runs/3g3mnlhm
...Successfully finished last run (ID:3g3mnlhm). Initializing new run:

wandb: wandb version 0.12.4 is available! To upgrade, please run:


wandb: $ pip install wandb --upgrade

localhost:8888/nbconvert/html/OneDrive/Desktop/APT3025/Top Streamers on Twitch.ipynb?download=false 6/10


10/9/21, 5:43 PM Top Streamers on Twitch

Tracking run with wandb version 0.12.2


Syncing run Logistic_Regression_Twitch_Streamers to Weights & Biases (Documentation).
Project page: https://wandb.ai/fardosa1904/Twitch%20Streamers
Run page: https://wandb.ai/fardosa1904/Twitch%20Streamers/runs/3nj72a49
Run data is saved locally in C:\Users\fardo\OneDrive\Desktop\APT3025\wandb\run-
20211009_172553-3nj72a49

Accuracy score of the logistic regression classifier with default hyperparameter val
ues 78.00%

---Classification report of the Logistic Regression Classifier with default paramete


r values ---

precision recall f1-score support

Adults 0.78 1.00 0.88 156


Non Adults 0.00 0.00 0.00 44

accuracy 0.78 200


macro avg 0.39 0.50 0.44 200
weighted avg 0.61 0.78 0.68 200

C:\Users\fardo\anaconda3\lib\site-packages\sklearn\metrics\_classification.py:1245:
UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 i
n labels with no predicted samples. Use `zero_division` parameter to control this be
havior.
_warn_prf(average, modifier, msg_start, len(result))
C:\Users\fardo\anaconda3\lib\site-packages\sklearn\metrics\_classification.py:1245:
UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 i
n labels with no predicted samples. Use `zero_division` parameter to control this be
havior.
_warn_prf(average, modifier, msg_start, len(result))
C:\Users\fardo\anaconda3\lib\site-packages\sklearn\metrics\_classification.py:1245:
UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 i
n labels with no predicted samples. Use `zero_division` parameter to control this be
havior.
_warn_prf(average, modifier, msg_start, len(result))
C:\Users\fardo\anaconda3\lib\site-packages\sklearn\metrics\_classification.py:1245:
UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 i
n labels with no predicted samples. Use `zero_division` parameter to control this be
havior.
_warn_prf(average, modifier, msg_start, len(result))
C:\Users\fardo\anaconda3\lib\site-packages\sklearn\metrics\_classification.py:1245:
UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 i
n labels with no predicted samples. Use `zero_division` parameter to control this be
havior.
_warn_prf(average, modifier, msg_start, len(result))

Decision Tree for the Data set of Twitch


Streamers
In [39]:
from sklearn import tree

In [40]:
testData = data

In [41]:
testData

localhost:8888/nbconvert/html/OneDrive/Desktop/APT3025/Top Streamers on Twitch.ipynb?download=false 7/10


10/9/21, 5:43 PM Top Streamers on Twitch

Out[41]: Watch Stream Peak Average Followers Views


Followers Adult
time(Minutes) time(minutes) viewers viewers gained gained

0 6196161750 215250 222720 27716 3246298 1734810 93036735 0

1 6091677300 211845 310998 25610 5310163 1370184 89705964 0

2 5644590915 515280 387315 10976 1767635 1023779 102611607 1

3 3970318140 517740 300575 7714 3944850 703986 106546942 0

4 3671000070 123660 285644 29602 8938903 2068424 78998587 0

... ... ... ... ... ... ... ... ...

995 122524635 13560 21359 9104 601927 562691 2162107 0

996 122523705 153000 3940 793 213212 52289 4399897 0

997 122452320 217410 6431 567 109068 -4942 3417970 0

998 122311065 104745 10543 1153 547446 109111 3926918 0

999 122192850 99180 13788 1205 178553 59432 2049420 0

1000 rows × 8 columns

In [42]:
testY= testData['Adult']

In [43]:
testY

Out[43]: 0 0
1 0
2 1
3 0
4 0
..
995 0
996 0
997 0
998 0
999 0
Name: Adult, Length: 1000, dtype: uint8

In [56]:
testX= testData.drop(['Watch time(Minutes)','Adult'], axis= 1)

In [57]:
testX

Out[57]: Stream Peak Average Followers Views


Followers
time(minutes) viewers viewers gained gained

0 215250 222720 27716 3246298 1734810 93036735

1 211845 310998 25610 5310163 1370184 89705964

2 515280 387315 10976 1767635 1023779 102611607

3 517740 300575 7714 3944850 703986 106546942

4 123660 285644 29602 8938903 2068424 78998587

... ... ... ... ... ... ...

localhost:8888/nbconvert/html/OneDrive/Desktop/APT3025/Top Streamers on Twitch.ipynb?download=false 8/10


10/9/21, 5:43 PM Top Streamers on Twitch

Stream Peak Average Followers Views


Followers
time(minutes) viewers viewers gained gained

995 13560 21359 9104 601927 562691 2162107

996 153000 3940 793 213212 52289 4399897

997 217410 6431 567 109068 -4942 3417970

998 104745 10543 1153 547446 109111 3926918

999 99180 13788 1205 178553 59432 2049420

1000 rows × 6 columns

In [58]:
clf=tree.DecisionTreeClassifier(criterion='entropy', max_depth = 4)

In [59]:
clf=clf.fit(X,y)

In [60]:
predY=clf.predict(testX)

In [61]:
predY

Out[61]: array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
localhost:8888/nbconvert/html/OneDrive/Desktop/APT3025/Top Streamers on Twitch.ipynb?download=false 9/10
10/9/21, 5:43 PM Top Streamers on Twitch

0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=uint8)

In [62]:
predictions= pd.concat([testData[testData.columns[0]], testData['Adult'],pd.Series(p

In [63]:
predictions

Out[63]: Watch time(Minutes) Adult Predict Adult

0 6196161750 0 0

1 6091677300 0 0

2 5644590915 1 0

3 3970318140 0 0

4 3671000070 0 0

... ... ... ...

995 122524635 0 0

996 122523705 0 0

997 122452320 0 0

998 122311065 0 0

999 122192850 0 0

1000 rows × 3 columns

In [64]:
from sklearn.metrics import accuracy_score

In [65]:
print('Accuracy on test data is %.2f'%(accuracy_score(testY,predY)*100.))

Accuracy on test data is 77.50

In [ ]:

localhost:8888/nbconvert/html/OneDrive/Desktop/APT3025/Top Streamers on Twitch.ipynb?download=false 10/10

You might also like