Professional Documents
Culture Documents
Report Gurkirat
Report Gurkirat
Landran, Mohali-140307
Submitted by:
TRAINING INCHARGE
Designation …………………
Department …………………
Seal ……………………………….
SUBMITTED BY
Name ………………
Branch ………………
2
PHOTOCOPY OF CERTIFICATE OF TRAINING
3
ACKNOWLEDGEMENT
4
DECLARATION
I hereby declare that the project entitled “IPL Win Predictor” is an authentic record of my work
carried out as requirements for the award of degree in Bachelor of Technology (ECE) at
Chandigarh Engineering College, Landran (Mohali), under the guidance of Er. Dev Pandya from
TCIL –IT during the period of 4 week summer training started from 21/07/2022 to 12/08/2022.
(Signature of student)
Name & Roll No
5
LIST OF FIGURES
1.1 10
1.2 11
2.1 14
2.2 14
2.3 15
2.4 16
2.5 16
2.6 17
2.7 18
2.8 18
2.9 19
2.10 20
4.1 33
5.1 34
5.2 35
5.3 44
6
LIST OF TABLES
4.1 27
7.1 46 - 49
7
TABLE OF CONTENTS
1. Chapter-1
Company Profile
7 Appendix 46 - 49
9
CHAPTER-1
COMPANY PROFILE
Fig. 1.1
TCIL-IT is a leading company for providing six months and six weeks industrial training in
Chandigarh for students. The TCIL-IT is training division of TCIL, a premier engineering
organization, is a Government of India Enterprise, Ministry of Communication and Information
Technology associated with administrative control of department of telecommunications (DoT),
which was started in the year 1978. Further in the year 1999, ICS had initiated the Six
months/Six weeks training division with TCIL-IT, which is managed by ICSIL in
Chandigarh.This joint venture is the coordination of Delhi State Industrial Infrastructure
Development Corporation (DSIIDC) and an undertaking of Delhi Government &
Telecommunication Consultants India Limited (TCIL) itself.
TCIL-IT Chandigarh is a fastest emerging company in the IT and telecommunications industry.
Being a well accredited company, we have specialized in the field of various industrial training
programs and have been maintaining a strong foothold ever since our inception in the year 1999
with ICS, Chandigarh. Intelligent Communication Systems India Ltd. or ICSIL controls its
managerial and administrative aspects. ICSIL is a joint venture of an undertaking of the Delhi
Government, Delhi State Industrial Infrastructure Development Corporation and TCIL, which
is an enterprise of the Government of India under the Ministry of Communication and
Information Technology, New Delhi. We offer training for various technical areas of
specialization for B.Tech students of CSE/IT/ECE/ELECTRICAL, Diploma
10
or MCA/MSc-IT. Duration of training can stretch from 6 weeks or 6 months depending upon
the university and Institutes criteria of training.
By educating skilled IT and Telecom professionals capable of fulfilling all requirements of an
industry, we aim to facilitate our nation. Our entire staff consists of professionals with an
extensive experience in the industry. With years of experience and thorough knowledge, these
professionals help students in their overall professional growth as well as in developing the right
attitude with an impressive outlook by sharing their knowledge. At TCIL-IT Chandigarh, stress
is laid on practical knowledge and training and this is done by providing students with the
opportunity to work on live projects under people with expertise in the respective field.
Scheduled work forms the basis of our working and we firmly believe in it. All forms of training
at TCIL-IT are in accordance with our perfect schedule so that all students get complete
information in a well formatted manner. We have a wonderful track record of students, trained
by us that have achieved success in the industry in their field of expertise. We have industrial
relations and associations with several top notch MNCs and we try to provide nothing else but
the best to our students.
1.2 Vision and Mission:
To excel and maintain leadership in providing Optimal solutions for Telecommunications and
Information Technology Service Sector globally and to diversify by providing excellent
infrastructure facilities particularly in the high-tech areas.
Fig. 1.2
11
TCIL-IT provides industrial training in the following domains:
IT & CSE:
Core and Advanced Java (J2EE or J2ME)
Advance .Net Technology (ASP.Net, C#, VB) with SQL Server
Practical Web Designing Applications
IT Security & Ethical Hacking
Android Application Development Program
Oracle DBA
Core & Advance PHP (Web Development Applications)
Software Testing (Manual / Automatic)
Networking Technology (MCITP / CCNA)
Cloud Computing
Electronics and Communication:
Understanding Emerging Wireless Communication Technologies
Embedded System – 8/16 bit Microcontroller | PIC | ARM
Wireless Communications
Advanced Telecom Training
Embedded System (GSM)
VLSI
Networking (Intranet/LAN/WAN), Linux Administration
PLC Automation
Electrical Engineering:
Electrical Plant Design Engineering.
1.4 Address:
TCIL-IT (ICS)
S.C.O. 3017-18, Second Floor
Opp. Kisan Bhavan
Sector 22D, Chandigarh- 160 022
Email: tcilchd@gmail.com
Telephone: 9876795015, 9781554540
12
Chapter-2
Artificial Intelligence comprises two words “Artificial” and “Intelligence”. Artificial refers
to something which is made by humans or a non-natural thing and Intelligence means the ability
to understand or think. There is a misconception that Artificial Intelligence is a system, but it is
not a system. AI is implemented in the system. There can be so many definitions ofAI, one
definition can be “It is the study of how to train the computers so that computers can do things
which at present humans can do better”. Therefore It is an intelligence that we wantto add all
the capabilities to a machine that human contains.
Machine Learning is the learning in which a machine can learn on its own without being
explicitly programmed. It is an application of AI that provides the system the ability to
automatically learn and improve from experience. Here we can generate a program by integrating
the input and output of that program. One of the simple definitions of Machine Learning is
“Machine Learning is said to learn from experience E w.r.t some class of task T and a
performance measure P if learners performance at the task in the class as measured by P
improves with experiences.”
Deep Learning is a subset of Machine Learning. It is based on learning by example, just like
humans do, using Artificial Neural Networks. These Artificial Neural Networks are created to
mimic the neurons in the human brain so that Deep Learning algorithms can learn much more
efficiently. Deep Learning is so popular now because of its wide range of applications in
modern technology. From self-driving cars to image, speech recognition, and natural language
processing, Deep Learning is used to achieve results that were not possible before.
13
Fig. 2.1
Fig. 2.2
14
2.4 Popular Machine Learning Algorithms :
Let’s look at some of the popular Machine Learning algorithms that are based on specific
types of Machine Learning.
Supervised learning algorithms or methods are the most commonly used ML algorithms. This
method or learning algorithm take the data sample i.e. the training data and its associated output
i.e. labels or responses with each data samples during the training process. Examples of
supervised machine learning algorithms includes Decision tree, Random Forest, KNN,
Logistic Regression etc.
Fig. 2.3
Supervised Machine Learning includes Regression and Classification algorithms. Some of the
more popular algorithms in these categories are:
The Linear Regression Algorithm provides the relation between an independent and a
dependent variable. It demonstrates the impact on the dependent variable when the
independent variable is changed in any way. So the independent variable is called the
explanatory variable and the dependent variable is called the factor of interest.
15
Fig. 2.4
An example of the Linear Regression Algorithm usage is to analyze the property prices in the
area according to the size of the property, number of rooms, etc.
2. Logistic Regression Algorithm
Fig. 2.5
16
The Logistic Regression Algorithm deals in discrete values whereas the Linear Regression
Algorithm handles predictions in continuous values. This means that Logistic Regression is a
better option for binary classification. An event in Logistic Regression is classified as 1 if it
occurs and it is classified as 0 otherwise. Hence, the probability of a particular event
occurrence is predicted based on the given predictor variables. An example of the Logistic
Regression Algorithm usage is in medicine to predict if a person has malignant breast cancer
tumors or not based on the size of the tumors.
Fig. 2.6
Naive Bayes Classifier Algorithm is used to classify data texts such as a web page, a
document, an email, among other things. This algorithm is based on the Bayes Theorem of
Probability and it allocates the element value to a population from one of the categories that
are available. An example of the Naive Bayes Classifier Algorithm usage is for Email Spam
Filtering. Gmail uses this algorithm to classify an email as Spam or Not Spam.
17
Fig. 2.7
Unsupervised Machine Learning mainly includes Clustering algorithms. Some of the more
popular algorithms in this category are:
Fig. 2.8
18
Let’s imagine that you want to search the name “Harry” on Wikipedia. Now, “Harry” can
refer to Harry Potter, Prince Harry of England, or any other popular Harry on Wikipedia! So
Wikipedia groups the web pages that talk about the same ideas using the K Means Clustering
Algorithm (since it is a popular algorithm for cluster analysis). K Means Clustering Algorithm
in general uses K number of clusters to operate on a given data set. In this manner, the output
contains K clusters with the input data partitioned among the clusters.
2. Apriori Algorithm
Fig. 2.9
The Apriori Algorithm uses the if-then format to create association rules. This means that if a
certain event 1 occurs, then there is a high probability that a certain event 2 also occurs. For
example: IF someone buys a car, THEN there is a high chance they buy car insurance as well.
The Apriori Algorithm generates this association rule by observing the number of people who
bought car insurance after buying a car. For example, Google auto-complete uses the Apriori
Algorithm. When a word is typed in Google, the Apriori Algorithm looks for the associated
words that are usually typed after that word and displays the possibilities.
Artificial Neural Networks are modeled after the neurons in the human brain. They contain
artificial neurons which are called units. These units are arranged in a series of layers that
together constitute the whole Artificial Neural Networks in a system. A layer can have only a
19
dozen units or millions of units as this depends on the complexity of the system. Commonly,
Artificial Neural Networks have an input layer, output layer as well as hidden layers. The
input layer receives data from the outside world which the neural network needs to analyze or
learn about.
Fig. 2.10
Then this data passes through one or multiple hidden layers that transform the input into data
that is valuable for the output layer. Finally, the output layer provides an output in the form of
a response of the Artificial Neural Networks to input data provided.
In the majority of neural networks, units are interconnected from one layer to another. Each of
these connections has weights that determine the influence of one unit on another unit. As the
data transfers from one unit to another, the neural network learns more and more about the
data which eventually results in an output from the output layer.
Machine Learning is the most rapidly growing technology and according to researchers we are in
the golden year of AI and ML. It is used to solve many real-world complex problems which
cannot be solved with traditional approach. Following are some real-world applications of ML
−
Emotion analysis
Sentiment analysis
Error detection and prevention
20
Weather forecasting and prediction
Stock market analysis and forecasting
Speech synthesis
Speech recognition
Customer segmentation
Object recognition
Fraud detection
Fraud prevention
Recommendation of products to customer in online shopping.
21
CHAPTER-3
FEASIBILITY STUDY
Feasibility is the determination of whether or not a project is worth doing the process followed
making this determination is called feasibility study. This of course determines if a project can
and should be taken. Once it has been determined that a project is feasible, the analyst can go
ahead and prepare the project specification which finalizes project requirements. Generally,
feasibility studies are undertaken within right time constraints and normally culminate in a written
and oral feasibility report. The contents and recommendations of such a study will be used as a
sound basis for deciding whether to proceed, postpone or cancel the project. Thus, since the
feasibility study may lead to the commitment of large resources, it becomes necessary that it
should be conducted competently and that no fundamental errors of judgment are made.
• Technical feasibility
• economic feasibility
• operational feasibility
This is concerned with specifying equipment and software and hardware that will successfully
satisfy the user requirement. The technical needs off the system may vary considerably, but might
include:
In examining technical feasibility, configuration of the system is given more importance than the
actual make of hardware. The configuration should give the complete picture about the system
requirements. What speeds of input and output should be achieved at particular quality of
printing.
22
According to the definition of technical feasibility the compatibility between front-end and back-
end is very important. In our project the compatibility of both is very good. The degree of
compatibility of JSP and SQL Server 2000 is very good. The speed of output is very good when
we enter the data and click button then the response time is very fast and give result very I never
find difficulty when we use complex query or heavy transaction. The speed of transaction is
always smooth and constant. This software provides facility to communicate data to distant
location.
We use Java Server Pages and Java Script. The designing of front-end of any project is very
important so we select Java Server Pages, Java Script as front-end due to following reasons:
At present scenario the no of backend are available but I have selected SQL Server 2000 because
of the following number of reasons:
With the help of above support we remove defect of existing software. In future we can easily
switch over any plate form. To ensure that system does not halt in case of undesired situation or
events. Problem effected of any module does not effect any module of the system. A change of
hardware does not produce problem.
It is mainly related to human organizational and political aspects. The points to be considered are:
• What changes will be brought with the system? What organization structures are
disturbed?
• What new skills will be required?
• Do the existing staff members have these skills?
• If not, can they be trained in due course of time?
23
At present stage all the work is done manually. So, throughput and response time is too much.
Major problem is lack of security check that should have been applied. Finding out the detail
regarding customer's transaction was very difficult, because data store was in different books and
different places. In case of any problem, no one can solve the problem until the person responsible
is not present. Current communication is entirely on telephonic conversation or personal
meetings. Post computerization, staff can interact using internet. Now, we will explain the last
point of operational feasibility i.e. handling and keeping of software, at every point of designing
I will take care that menu options are not too complex and can be easily learned and required least
amount of technical skills as operators are going to be from non computers back ground.
Economic analysis is the most frequently used technique for evaluating the effectiveness of a
proposed system. More commonly known as cost/benefit analysis: the procedure is to determine
the benefits and saving that are expected from a proposed system and compare them with cost.
If benefits outweigh cost, a decision is taken to design and implement the system. Otherwise,
further justification or alternative in the proposed system will have to be made if it is to have a
chance of being approved. This is an ongoing effort that improves in accuracy at each phase of
the system life cycle.
• Hardware Requirements
• Software Requirements
Hardware used:
Software used:
24
Languages/IDEs used for these Products:
25
Chapter – 4
Literature Survey
26
• One language for many domains
A typical data science project includes various domains like data extraction, data
manipulation, data analysis, feature extraction, modelling, evaluation, deployment and
updating the solution. As Python is a multi-purpose language, it allows the data scientist
to address all these domains from a common platform.
4.3 Built-in Data Types in Python
• In programming, data type is an important concept.
• Variables can store data of different types, and different types can do different things.
• Python has the following data types built-in by default, in these categories:
4.4 List
• Lists are used to store multiple items in a single variable.
• Lists are one of 4 built-in data types in Python used to store collections of data, the other
3 are Tuple, Set, and Dictionary, all with different qualities and usage.
• Lists are created using square brackets.
27
• Example:
list2 = []
list3 = list((1,2,3))
print(list1)
print(list2)
# Output: [ ]
print(list3)
# Output: [1, 2, 3]
4.5 Tuple
• Tuples are similar to lists. This collection also has iterable, ordered, and (can contain)
repetitive data, just like lists.
• But unlike lists, tuples are immutable.
• Example:
tuple1=(1,2,'abc', 3, 4)
tuple2=()
tuple3=tuple((1,3,5,"hello"))
print(tuple1)
print(tuple2)
# Output: ()
print(tuple3)
4.6 Set
28
• Set is another data structure that holds a collection of unordered, iterable and mutable
data.
• It only contains unique elements.
• Example:
set1={1,2,3,'abc', 6}
print(set1)
4.7 Dictionary
• Unlike all other collection types, dictionaries strictly contain key-value pairs.
• Example:
dict1={"key1":"value1","key2":"value2"}
dict2={}
dict3=dict({1:"one",2:"two",3:"three"})
print(dict1)
print(dict2)
# Output: {}
print(dict3)
i. NumPy
It is another useful component that makes Python as one of the favorite languages for Data
Science. It basically stands for Numerical Python and consists of multidimensional array objects.
By using NumPy, we can perform the following important operations −
Mathematical and logical operations on arrays.
29
Fourier transformation
Operations associated with linear algebra.
We can also see NumPy as the replacement of MatLab because NumPy is mostly used along
with Scipy (Scientific Python) and Mat-plotlib (plotting library).
Example:
import numpy as np
print(arr)
print(type(arr))
Output:
[1 2 3 4 5]
<class 'numpy.ndarray'>
ii. Pandas
It is another useful Python library that makes Python one of the favorite languages for Data
Science. Pandas is basically used for data manipulation, wrangling and analysis. It was developed
by Wes McKinney in 2008. With the help of Pandas, in data processing we can accomplish the
following five steps −
Load
Prepare
Manipulate
Model
Analyze
Example:
import pandas as pd
df = pd.read_csv('data.csv')
print(df.to_string())
30
Output:
iii. Scikit-learn
Another useful and most important python library for Data Science and machine learning in
Python is Scikit-learn. The following are some features of Scikit-learn that makes it so useful −
It is built on NumPy, SciPy, and Matplotlib.
It is an open source and can be reused under BSD license.
It is accessible to everybody and can be reused in various contexts.
Wide range of machine learning algorithms covering major areas of ML like
classification, clustering, regression, dimensionality reduction, model selection etc.
can be implemented with the help of it.
Example :
31
from sklearn.tree import DecisionTreeClassifier
X, y = datasets.load_iris(return_X_y=True)
clf = DecisionTreeClassifier(random_state=42)
k_folds = KFold(n_splits = 5)
Output:
iv. Matplotlib
• Matplotlib is a low level graph plotting library in python that serves as a visualization
utility.
• Matplotlib was created by John D. Hunter.
• Matplotlib is open source and we can use it freely.
• Matplotlib is mostly written in python, a few segments are written in C, Objective-C and
Javascript for Platform compatibility.
• Example:
import numpy as np
plt.show()
Output:
Fig. 4.1
33
Chapter -5
Project Work
5.1 Introduction to the project
• In this project we use the dataset of IPL from season 2008 to 2019.
• IPL dataset is taken to analyze the metrics of different teams in IPL. Libraries such
as pandas, matplotlib, and seaborn are used to perform exploratory data analysis on top
of this IPL data.
• Finally, some machine learning algorithms are implemented to predict which team
has a better likelihood of winning the tournament.
34
Fig. 5.2: Flow diagram
35
5.3 Coding
36
37
38
39
40
41
42
43
5.4 Result and Discussion:
Random Forest is observed to be the best accurate classifier with 89.15% to predict the best
player performance.
This knowledge will be used in future to predict the winning teams for the next series IPL
matches. Hence using this prediction, the best team can be formed. This project opens scope for
future work in the field of cricket and predicting other important things like best team of players,
best venue, best city, best fielding decision to win a match.
44
Conclusion and Bibliography
Conclusion:
Selection of the best team for a cricket match plays a significant role for the team’s victory. The
main goal of this paper is to analyse the IPL cricket data and predict the players’ performance.
Here, three classification algorithms are used and compared to find the best accurate algorithm.
The implementation tools used are Anaconda navigator and Jupyter. Random Forest is observed
to be the best accurate classifier with 89.15% to predict the best player performance
References:
45
Appendix
Requires
Linear/non
Name Type Use normaliz
linear
ation
Linear Regression Model a scalar target with one or more quantitative features. Linear Yes
R: www.inside-r.org/r-doc/stats/lm
Python: http://scikit-
learn.org/stable/modules/generated/sklearn.linear_model.Lin
earRegression.html#sklearn.linear_model.LinearRegression
kit-
learn.org/stable/modules/generated/sklearn.linear_model.Log
isticRegression.html
R: https://cran.r-
project.org/web/packages/e1071/vignettes/svmdoc.pdf
46
Requires
Linear/non
Name Type Use normaliz
linear
ation
Python: http://scikit-learn.org/stable/modules/svm.html
SVM with Classification SVM with support for a variety of nonlinear models. Nonlinear Yes
project.org/web/packages/e1071/vignettes/svmdoc.pdf Pytho
n: http://scikit-learn.org/stable/modules/svm.html
K-nearest Classification Targets are computed based on those of the training set that Nonlinear Yes
neighbors /regression are “nearest” to the test examples via a distance formula (for
learn.org/stable/modules/generated/sklearn.neighbors.KNeig
hborsClassifier.html
Decision Classification Training data is recursively split into subsets based on Nonlinear No
trees /regression attribute value tests, and decision trees that predict targets
error rates.
kit-learn.org/stable/modules/tree.html#tree
47
Requires
Linear/non
Name Type Use normaliz
linear
ation
project.org/web/packages/randomForest/randomForest.pdf P
ython: http://scikit-
learn.org/stable/modules/generated/sklearn.ensemble.Rando
mForestClassifier.html
project.org/web/packages/gbm/gbm.pdf https://cran.r-
cikit-
learn.org/stable/modules/generated/sklearn.ensemble.Gradie
ntBoostingClassifier.html
Naïve Classification A simple, scalable classification algorithm used especially in Nonlinear Yes
R: https://cran.r-
learn.org/stable/modules/classes.html#module-
sklearn.naive_bayes
48
Requires
Linear/non
Name Type Use normaliz
linear
ation
Neural Classification Used to estimate unknown functions that are based on a large Nonlinear Yes
project.org/web/packages/neuralnet/neuralnet.pdf https://cran
.r-
learn.org/dev/modules/neural_networks_supervised.html http
://deeplearning.net/software/theano/
APIs. https://github.com/JohnLangford/vowpal_wabbit/wiki
algorithm. https://xgboost.readthedocs.org/en/latest/
49