Professional Documents
Culture Documents
Bhushan Kapadne
Gaurav Borse
Information Technology
Information Technology
SVKM’s Institute of Technology
SVKM’s Institute of Technology
Dhule, Maharashtra, India
Dhule, Maharashtra, India
bhushan.kapadnee@gmail.com
gauravborse612@gmail.com
Abstract — As it's widely recognized, the popularity of the team for every match. In this paper we proposed the system
Dream11 platform is steadily increasing, particularly during which predict the team based on players current form, past
the Indian Premier League (IPL) season. This platform serves records against team, pitch conditions, whether updates,
as an exciting avenue for cricket enthusiasts to engage in playing 11 of both teams, injuries updates and also
predictive team selection, offering opportunities to earn considering rules for creating and all other factors.
rewards. However, many users, including those with limited
platform knowledge, participate in creating virtual teams, II.
often resulting in losses, with occasional wins attributed to
luck. Fortunately, a solution rooted in machine learning aims
to address this challenge by creating a model capable of
predicting Dream11 fantasy cricket teams for IPL matches.
This prediction relies on a multitude of influential factors
affecting match outcomes. In the development of this solution,
machine learning algorithms such as XGBoost, CatBoost, and
Random Forest, all belonging to the category of supervised
gradient boosting algorithms, played a pivotal role.
Furthermore, we incorporated considerations of historical
records and live updates, with ESPN CricInfo serving as the
primary source for real time match information.
B. Database Module
Fig. 3. Batsman Dataset
Negative Gradient:
Before implementing project, we need to clean the data
i.e., to correct inaccurate details like player previous name
and current name, new added teams, data about new players, Update step: For prevention of overfitting each new
retired players, who didn’t involve in batting or bowling decision tree is with learning rate.
may have to null value in dataset and these needed to be
update: (4)
handled, also some of details data set not contains like
batsman strike rate, bowling economy, batsman average, where, η = learning rate
stumping, run concede to bowler, no of 6s ,4s, 50s, 100s. To F_t(x) = prediction of new tree
calculate these above factors, we used following F_(t-1)(x) = prediction of the ensemble up to the (t-1)-
calculations. th tree.
Average runs scored by the Batsman = Total Runs / Regularization: For prevention overfitting this
No. of Innings played by batsman. regularization term is used
Bowling Economy = Total Runs concede / Total Regularization Term = (5)
overs bowled
Strike Rate = (Runs scored / Ball faced) * 100 where,
γ and λ = regularization hyperparameters
In Similar way other factors are calculated. And by using |q| = number of data points in leaf q
this new dataset is generated. After collecting and preparing w_q = weight of leaf q
the data we moved towards training dataset. Here we used
XGBoost, CatBoost, Random Forest algorithms. Final Prediction: The Final prediction will be sum of
all decision trees in ensemble.
Final Prediction: (6)
XGBoost: In [5], this is supervised for learning problems
where we used training data with having multiple features CatBoost: In [5] it is high performance open-source library
say input Xi for prediction of target variable say yi. This which is based on Gradient-Boosted decision tree. It is
mainly used for problems like regression and classification
which contain number of independent features. It can handle blup = Upper bound for Bowler
categorial and numerical features. alllb = Lower bound for All-Rounder
Objective Function: To find out best model CatBoost allub = Upper bound for All-Rounder
minimizes specific objective function wktlb = Lower bound for Wicket-Keeper
Objective = (7) wktub = Upper bound for Wicket-Keeper
th
yi = true label of the i instance Ti = Player from team 1
Wi = Weight of ith player
ȳᵢ = predicted label by the current ensemble
b = Selected Player
f = ensemble of trees C = Credit level amount
Ω(f) = regularization term T0 = No of player from team 1
CatBoost combines these all steps and strategies to refine
ensemble of trees. First, we categorised players according to their role such
as Batsman, Bowler, All-Rounder, Wicket-Keeper and
Random Forest: In [1], the Random Forest technique is
binary variables for defining them are bti, bli, alli and wkti
applicable to both classification and regression tasks. It
respectively. Player were indexed from 0 and ith player
constructs decision trees using various data samples and, for
performance was denoted by pi. If the value for particular
classification, relies on their majority vote, while for
variable
regression, it calculates their average. It ensemble of
is one
multiple decision trees to generate prediction or
then it is
classification. With combining the outputs of these trees, we
true,
get more accurate result.
mean if
Machine learning model are often Black Boxes that bli is 1
makes their interpretation difficult. SHAP (Shapley then it is
Additive Explanation) values is method which is used to bowler.
increase transparency and interpretability of ML Model. It And if it
explains how a specific value of certain features influence is
prediction in contrast to what our predictions would be if bowler
those features had some baseline value. the spi
variable
D. Model Selection has
Once our model provided us with the projected points importance otherwise it has no meaning and he is fast
earned by each player in the match, we proceeded to bowler. Dream 11 also put limit while selecting players so
implement all the necessary rules for team selection. To we defined variables for that. For Batsman it was btlb (lower
achieve this, we need linear programming with pulp (Used bound) btub (upper bound). Also, for other bllb, blup, alllb, allub,
to solve linear programming problems) library. And the wktlb, wktub. Ti was for player choosen from which team. If Ti
team selection happened based on dream 11 credit points for is 0, then player is from team 2, otherwise from Team 1.
each player. Major problem here we faced is that
Continuous credit change of credit points for player. Also,
selection of Captain (Gives 2x points) and Vice-Captain IV. RESULT ANALYSIS
(1.5x Credit Points). The solution of this was Top 2 player
with most Credit points kept as the Captain and Vice- After receiving the projected points earned by each
Captain. Python PULP library had main role here. player for a match from our models, we had to apply the
aforementioned Rules and Constraints to determine the best
The whole problem was based on linear programming.
Dream11 team. To accomplish this, we employed linear
Suppose we have pool of 11 player from both team which programming using the pulp library. The selection of the 11
total of 22 and the performance was based on Dream 11 players for the team was primarily based on two factors: the
points. Dream11 points expected for each player and the credit cost
associated with each player. The credit cost was especially
bti = Batsman crucial, especially in high-profile matches such as MI vs.
bli = Bowler CSK. For instance, you might want to include expensive
alli = All-Rounder players like Rohit Sharma, Ruturaj Gaikwad, D Conway,
wkti = Wicket-Keeper Jasprit Bumrah, and MS Dhoni, all of whom cost more than
spi = Spin Bowler 10 credit points. However, it was essential to ensure that this
didn't deplete your credit points budget, leaving you with
pi. = Ith player performance
insufficient credits to assemble a competitive remaining
btlb = Lower bound for Batsman team. Out of the 11 players chosen, our system would
btub = Upper bound for Batsman designate the top 2 players as the Captain and Vice-Captain,
bllb = Lower bound for Bowler
awarding them 2x and 1.5x points, respectively, for the unveil patterns within the data highlighting which bowlers
match. excel against batsman or, in game scenarios. This valuable
insight can then be utilized by coaches and teams to make
decisions during matches.
Fig. 9. Predicted Result
A. Performance Evaluation