Professional Documents
Culture Documents
BSCS [6A]
Abstract ................................................................................................................................................... 5
1. Introduction ................................................................................................................................. 6
Name ................................................................................................................................................... 6
Context ................................................................................................................................................ 6
Content ................................................................................................................................................ 6
Inspiration ........................................................................................................................................... 6
Number of Features............................................................................................................................. 6
3. Objectives ................................................................................................................................... 8
4. Methodology ............................................................................................................................... 8
Dataset............................................................................................................................................. 8
Libraries used .................................................................................................................................. 8
Algorithms used: ............................................................................................................................. 8
5. Results ......................................................................................................................................... 9
6. Code .......................................................................................................................................... 11
2. Implementation 10
Viva
3. 15
(Oral/Task Based or Both)
Total 30 Signature:
---------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
Abstract
This report is based on the procedure of implementation of algorithms for data prediction
such as KMeans. Dataset is obtained from Kaggle. 80% of dataset is used for training and
20% of dataset is used for testing. Sklearn library is used for main purposes like prediction,
splitting, etc. For plotting data, matplotlib is used. For handling dataset, pandas is used.
1. Introduction
Name
CS: GO Round Winner Classification
Context
CS: GO is a tactical shooter, where two teams (CT and Terrorist) play for a
best of 30 rounds, with each round being 1 minute and 55 seconds. There are 5
players on each team (10 in total) and the first team to reach 16 rounds wins the game.
At the start, one team plays as CT and the other as Terrorist. After 15 rounds played,
the teams swap side. There are 7 different maps a game can be played on. You win a
round as Terrorist by either planting the bomb and making sure it explodes, or by
eliminating the other team. You win a round as CT by either eliminating the other
team, or by disarming the bomb, should it have been planted.
Content
The dataset was originally published by Skybox as part of their CS: GO AI
Challenge, running from Spring to Fall 2020. The data set consists of ~700 demos
from high level tournament play in 2019 and 2020. Warmup rounds and restarts have
been filtered, and for the remaining live rounds a round snapshot have been recorded
every 20 seconds until the round is decided. Following the initial publication, it has
been pre-processed and flattened to improve readability and make it easier for
algorithms to process. The total number of snapshots is 122411.
Inspiration
• What types of machine learning models perform best on this dataset?
• Which features are most indicative of which teams wins the round?
• Are some weapons favorable to others?
• What attributes should your team have to win? Health, armor or
money?
Number of Observations
The total number of observations is 122411.
Number of Features
The total number of features is 97.
3. Objectives
After completing this project, we learn the following concepts:
• Basics of machine learning
• Numpy library
• Matplotlib library
• Sklearn library
• Pandas library
• And some other concepts
4. Methodology
Dataset: Downloaded from Kaggle.com
Algorithms used:
KMeans: k-means clustering is a method of vector quantization, originally from signal
processing, that aims to partition n observations into k clusters in which each observation
belongs to the cluster with the nearest mean, serving as a prototype of the cluster.
Random forest: Random forests or random decision forests are an ensemble learning
method for classification, regression and other tasks that operate by constructing a multitude
of decision trees at training time and outputting the class that is the mode of the classes
(classification) or mean/average prediction (regression) of the individual trees. Random
decision forests correct for decision trees' habit of overfitting to their training set. Random
forests generally outperform decision trees, but their accuracy is lower than gradient boosted
trees. However, data characteristics can affect their performance
Decision Tree: Decision Tree algorithm belongs to the family of supervised learning
algorithms. Unlike other supervised learning algorithms, decision tree algorithm can be used
for solving regression and classification problems too. The general motive of using Decision
Tree is to create a training model which can use to predict class or value of target variables by
learning decision rules inferred from prior data(training data).
5. Results
Decision Tree
Random Forest
Liner Regression
KMeans Clustering
6. Code
Decision Tree
import pandas as pd
import numpy as np
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn import metrics
dataset = pd.read_csv("csgo_round_snapshots.csv")
X = dataset[[
'time_left',
'ct_score',
't_score',
'bomb_planted',
'ct_health',
't_health',
'ct_armor',
't_armor',
'ct_money',
't_money',
'ct_helmets',
't_helmets',
'ct_defuse_kits',
'ct_players_alive',
't_players_alive'
]].values
y = dataset[['round_winner']]
label_mapping = {
"CT":1,
"T":0
}
y["round_winner"] = y["round_winner"].map(label_mapping)
y = np.array(y)
#print(pd.crosstab(y_train, label))
i = 100
AV = y[i] == 0 and "Terroists" or "Counter terrorists"
PV = model.predict(X)[i] < 1 and "Terroists" or "Counter terrorists"
Random Forest
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.ensemble import RandomForestRegressor
from sklearn import metrics
import os
import matplotlib.pyplot as plt
dataset = pd.read_csv("csgo_round_snapshots.csv")
X = dataset[[
'time_left',
'ct_score',
't_score',
'bomb_planted',
'ct_health',
't_health',
'ct_armor',
't_armor',
'ct_money',
't_money',
'ct_helmets',
't_helmets',
'ct_defuse_kits',
'ct_players_alive',
't_players_alive'
]].values
y = dataset[['round_winner']]
label_mapping = {
"CT":1,
"T":0
}
y["round_winner"] = y["round_winner"].map(label_mapping)
y = np.array(y)
#print(pd.crosstab(y_train, label))
i = 200
#x = a > b and 10 or 11
AV = y[i] == 0 and "Terroists" or "Counter terrorists"
PV = regressor.predict(X)[i] < 1 and "Terroists" or "Counter terrorists"
Liner Regression
import numpy as np
import pandas as pd
from sklearn import linear_model
from sklearn.metrics import accuracy_score
dataset = pd.read_csv("csgo_round_snapshots.csv")
X = dataset[[
'time_left',
'ct_score',
't_score',
'bomb_planted',
'ct_health',
't_health',
'ct_armor',
't_armor',
'ct_money',
't_money',
'ct_helmets',
't_helmets',
'ct_defuse_kits',
'ct_players_alive',
't_players_alive'
]].values
y = dataset[['round_winner']]
label_mapping = {
"CT":1,
"T":0
}
y["round_winner"] = y["round_winner"].map(label_mapping)
y = np.array(y)
linear_regression_model = linear_model.LinearRegression()
#plt.scatter(X.T[14], y)
#plt.show()
i = 100
AV = y[i] == 0 and "Terroists" or "Counter terrorists"
PV = model.predict(X)[i] < 1 and "Terroists" or "Counter terrorists"
fig, ax = plt.subplots()
ax.scatter(y, predicted)
ax.plot([y.min(), y.max()], [y.min(), y.max()], 'k--', lw=3)
ax.set_xlabel('Measured')
ax.set_ylabel('Predicted')
plt.show()
KMeans Clustering
import pandas as pd
import numpy as np
from sklearn.cluster import KMeans
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import scale
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt
from sklearn import metrics
dataset = pd.read_csv("csgo_round_snapshots.csv")
X = dataset[[
'time_left',
'ct_score',
't_score',
'bomb_planted',
'ct_health',
't_health',
'ct_armor',
't_armor',
'ct_money',
't_money',
'ct_helmets',
't_helmets',
'ct_defuse_kits',
'ct_players_alive',
't_players_alive'
]].values
y = dataset[['round_winner']]
label_mapping = {
"CT":1,
"T":0
}
y["round_winner"] = y["round_winner"].map(label_mapping)
y = np.array(y)
print("lables", label)
print("predictions: ", prediction)
print("accuracy: ", accuracy_score(y_test, prediction)*100)
print("actual: ", y_test)
print('Mean Absolute Error:', metrics.mean_absolute_error(y_test, prediction))
print('Mean Squared Error:', metrics.mean_squared_error(y_test, prediction))
print('Root Mean Squared Error:', np.sqrt(metrics.mean_squared_error(y_test, prediction)))
#print(pd.crosstab(y_train, label))
i = 100
AV = y[i] == 0 and "Terroists" or "Counter terrorists"
PV = model.predict(X)[i] < 1 and "Terroists" or "Counter terrorists"
print("According to actual value, {} wins.".format(AV))
print("According to prediction, {} wins.".format(PV))
reduced_data = PCA(n_components=2).fit_transform(X)
kmeans = KMeans(init='k-means++', n_clusters=2, n_init=10)
kmeans.fit(reduced_data)
# Plot the decision boundary. For that, we will assign a color to each
x_min, x_max = reduced_data[:, 0].min() - 1, reduced_data[:, 0].max() + 1
y_min, y_max = reduced_data[:, 1].min() - 1, reduced_data[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
# Obtain labels for each point in mesh. Use last trained model.
Z = kmeans.predict(np.c_[xx.ravel(), yy.ravel()])
7. References
www.youtube.com
www.stackoverflow.com
https://towardsdatascience.com/