You are on page 1of 49

Chandigarh Engineering College

Landran, Mohali-140307

4 Week Summer Training Report


On

AI & ML USING PYTHON PROGRAMMING

B.Tech – IIIrd Year (5th Semester)

Department of Electronics and Communication Engineering

Submitted by:

Name: Gurkirat Kaur


Roll No.: 2002702
Section: A1
Subject: 4-Week Industrial Training
Subject Code: BTEC-521-18
1
FINAL REPORT (for 4 weeks)

Name of company/Organization …………………………………………


Address …………………………………………
Software …………………………………………
Name of Project allotted (if any) …………………………………………
Report Period of training is from …………………… To ………………
Deptt/Branch/Section where training going on .………………………………….

TRAINING INCHARGE

Name & Signature with Seal ………………………………………..

TRAINING INCHARGE Date of Submission ..…..………..

Name & Signature…………………

Designation …………………

Department …………………

Seal ……………………………….

SUBMITTED BY

Name ………………

Branch ………………

Univ. Roll No. ………………

2
PHOTOCOPY OF CERTIFICATE OF TRAINING

3
ACKNOWLEDGEMENT

I am over whelmed in all humbleness and gratefulness to acknowledge my


depth to all those who have helped me to put these ideas, well above the level
of simplicity and into something concrete.
I would like to express my special thanks of gratitude to my trainer Mr. Dev
from TCIL –IT as well as our HOD, Prof. (Dr.) Vinay Bhatia, who gave me the
golden opportunity to do this wonderful project & training on the topic “IPL
Win Predictor” which also helped me in doing a lot of Research and I came to
know about so many new things. I am really thankful to them.
Any attempt at any level can’t be satisfactorily completed without the support
and guidance of my parents and friends.
I would like to thank my parents who helped me a lot in gathering different
information, collecting data and guiding me from time to time in making this
project, despite of their busy schedules, they gave me different ideas in making
this project unique.

4
DECLARATION

I hereby declare that the project entitled “IPL Win Predictor” is an authentic record of my work
carried out as requirements for the award of degree in Bachelor of Technology (ECE) at
Chandigarh Engineering College, Landran (Mohali), under the guidance of Er. Dev Pandya from
TCIL –IT during the period of 4 week summer training started from 21/07/2022 to 12/08/2022.

(Signature of student)
Name & Roll No

5
LIST OF FIGURES

FIGURE NO. PAGE NO .

1.1 10
1.2 11
2.1 14
2.2 14
2.3 15
2.4 16
2.5 16
2.6 17
2.7 18
2.8 18
2.9 19
2.10 20
4.1 33
5.1 34
5.2 35
5.3 44

6
LIST OF TABLES

TABLE PAGE NO.


NO.

4.1 27
7.1 46 - 49

7
TABLE OF CONTENTS

S.No. Title Page No.

1. Chapter-1
Company Profile

1.1 TCIL-IT Chandigarh 10 - 12


1.2 Vision and Mission
1.3 Training Domains
1.4 Address
2. Chapter-2
Introduction about AI and ML

2.1 Artificial Intelligence (AI)


2.2 Machine Learning (ML)
2.3 Deep Learning (DL) 13 – 21
2.4 Popular Machine Learning Algorithms
2.4.1 Supervised Machine Learning
2.4.2 Unsupervised Machine Learning
2.5 Artificial Neutral Network
2.6 Applications of Machine Learning
3. Chapter – 3
Feasibility Study

3.1 Introduction to Feasibility study


3.2 Technical feasibility 22 - 25
3.3 Operational Feasibility
3.4 Economic Feasibility
3.5 Operating Environment
3.6 Methodology of work
4. Chapter – 4
Literature Survey
26 - 33
4.1 An Introduction to Python
4.2 Python for Data Science
8
4.3 Built-in Data Types in Python
4.4 List
4.5 Tuple
4.6 Set
4.7 Dictionary
4.8 Python Libraries for Machine Learning :
i. NumPy
ii. Pandas
iii. Scikit-learn
iv. Matplotlib
5. Chapter -5
Project Work

5.1 Introduction to the project


5.2 Working (Flow chart/ screenshots) 34 - 44
5.3 Coding
5.4 Result and Discussion
5.5 Future scope of project
6. Conclusion and Bibliography 45

7 Appendix 46 - 49

9
CHAPTER-1
COMPANY PROFILE

1.1 TCIL – IT Chandigarh:

Fig. 1.1

TCIL-IT is a leading company for providing six months and six weeks industrial training in
Chandigarh for students. The TCIL-IT is training division of TCIL, a premier engineering
organization, is a Government of India Enterprise, Ministry of Communication and Information
Technology associated with administrative control of department of telecommunications (DoT),
which was started in the year 1978. Further in the year 1999, ICS had initiated the Six
months/Six weeks training division with TCIL-IT, which is managed by ICSIL in
Chandigarh.This joint venture is the coordination of Delhi State Industrial Infrastructure
Development Corporation (DSIIDC) and an undertaking of Delhi Government &
Telecommunication Consultants India Limited (TCIL) itself.
TCIL-IT Chandigarh is a fastest emerging company in the IT and telecommunications industry.
Being a well accredited company, we have specialized in the field of various industrial training
programs and have been maintaining a strong foothold ever since our inception in the year 1999
with ICS, Chandigarh. Intelligent Communication Systems India Ltd. or ICSIL controls its
managerial and administrative aspects. ICSIL is a joint venture of an undertaking of the Delhi
Government, Delhi State Industrial Infrastructure Development Corporation and TCIL, which
is an enterprise of the Government of India under the Ministry of Communication and
Information Technology, New Delhi. We offer training for various technical areas of
specialization for B.Tech students of CSE/IT/ECE/ELECTRICAL, Diploma

10
or MCA/MSc-IT. Duration of training can stretch from 6 weeks or 6 months depending upon
the university and Institutes criteria of training.
By educating skilled IT and Telecom professionals capable of fulfilling all requirements of an
industry, we aim to facilitate our nation. Our entire staff consists of professionals with an
extensive experience in the industry. With years of experience and thorough knowledge, these
professionals help students in their overall professional growth as well as in developing the right
attitude with an impressive outlook by sharing their knowledge. At TCIL-IT Chandigarh, stress
is laid on practical knowledge and training and this is done by providing students with the
opportunity to work on live projects under people with expertise in the respective field.
Scheduled work forms the basis of our working and we firmly believe in it. All forms of training
at TCIL-IT are in accordance with our perfect schedule so that all students get complete
information in a well formatted manner. We have a wonderful track record of students, trained
by us that have achieved success in the industry in their field of expertise. We have industrial
relations and associations with several top notch MNCs and we try to provide nothing else but
the best to our students.
1.2 Vision and Mission:

To excel and maintain leadership in providing Optimal solutions for Telecommunications and
Information Technology Service Sector globally and to diversify by providing excellent
infrastructure facilities particularly in the high-tech areas.

1.3 Training Domains:

Fig. 1.2

11
TCIL-IT provides industrial training in the following domains:
IT & CSE:
 Core and Advanced Java (J2EE or J2ME)
 Advance .Net Technology (ASP.Net, C#, VB) with SQL Server
 Practical Web Designing Applications
 IT Security & Ethical Hacking
 Android Application Development Program
 Oracle DBA
 Core & Advance PHP (Web Development Applications)
 Software Testing (Manual / Automatic)
 Networking Technology (MCITP / CCNA)
 Cloud Computing
Electronics and Communication:
 Understanding Emerging Wireless Communication Technologies
 Embedded System – 8/16 bit Microcontroller | PIC | ARM
 Wireless Communications
 Advanced Telecom Training
 Embedded System (GSM)
 VLSI
 Networking (Intranet/LAN/WAN), Linux Administration
 PLC Automation
Electrical Engineering:
 Electrical Plant Design Engineering.

1.4 Address:

TCIL-IT (ICS)
S.C.O. 3017-18, Second Floor
Opp. Kisan Bhavan
Sector 22D, Chandigarh- 160 022
Email: tcilchd@gmail.com
Telephone: 9876795015, 9781554540

12
Chapter-2

Introduction about AI and ML

2.1 Artificial Intelligence (AI)

Artificial Intelligence comprises two words “Artificial” and “Intelligence”. Artificial refers
to something which is made by humans or a non-natural thing and Intelligence means the ability
to understand or think. There is a misconception that Artificial Intelligence is a system, but it is
not a system. AI is implemented in the system. There can be so many definitions ofAI, one
definition can be “It is the study of how to train the computers so that computers can do things
which at present humans can do better”. Therefore It is an intelligence that we wantto add all
the capabilities to a machine that human contains.

2.2 Machine Learning (ML)

Machine Learning is the learning in which a machine can learn on its own without being
explicitly programmed. It is an application of AI that provides the system the ability to
automatically learn and improve from experience. Here we can generate a program by integrating
the input and output of that program. One of the simple definitions of Machine Learning is
“Machine Learning is said to learn from experience E w.r.t some class of task T and a
performance measure P if learners performance at the task in the class as measured by P
improves with experiences.”

2.3 Deep Learning (DL)

Deep Learning is a subset of Machine Learning. It is based on learning by example, just like
humans do, using Artificial Neural Networks. These Artificial Neural Networks are created to
mimic the neurons in the human brain so that Deep Learning algorithms can learn much more
efficiently. Deep Learning is so popular now because of its wide range of applications in
modern technology. From self-driving cars to image, speech recognition, and natural language
processing, Deep Learning is used to achieve results that were not possible before.

13
Fig. 2.1

Fig. 2.2

14
2.4 Popular Machine Learning Algorithms :

Let’s look at some of the popular Machine Learning algorithms that are based on specific
types of Machine Learning.

2.4.1 Supervised Machine Learning

Supervised learning algorithms or methods are the most commonly used ML algorithms. This
method or learning algorithm take the data sample i.e. the training data and its associated output
i.e. labels or responses with each data samples during the training process. Examples of
supervised machine learning algorithms includes Decision tree, Random Forest, KNN,
Logistic Regression etc.

Fig. 2.3

Supervised Machine Learning includes Regression and Classification algorithms. Some of the
more popular algorithms in these categories are:

1. Linear Regression Algorithm

The Linear Regression Algorithm provides the relation between an independent and a
dependent variable. It demonstrates the impact on the dependent variable when the
independent variable is changed in any way. So the independent variable is called the
explanatory variable and the dependent variable is called the factor of interest.

15
Fig. 2.4

An example of the Linear Regression Algorithm usage is to analyze the property prices in the
area according to the size of the property, number of rooms, etc.
2. Logistic Regression Algorithm

Fig. 2.5

16
The Logistic Regression Algorithm deals in discrete values whereas the Linear Regression
Algorithm handles predictions in continuous values. This means that Logistic Regression is a
better option for binary classification. An event in Logistic Regression is classified as 1 if it
occurs and it is classified as 0 otherwise. Hence, the probability of a particular event
occurrence is predicted based on the given predictor variables. An example of the Logistic
Regression Algorithm usage is in medicine to predict if a person has malignant breast cancer
tumors or not based on the size of the tumors.

3. Naive Bayes Classifier Algorithm

Fig. 2.6

Naive Bayes Classifier Algorithm is used to classify data texts such as a web page, a
document, an email, among other things. This algorithm is based on the Bayes Theorem of
Probability and it allocates the element value to a population from one of the categories that
are available. An example of the Naive Bayes Classifier Algorithm usage is for Email Spam
Filtering. Gmail uses this algorithm to classify an email as Spam or Not Spam.

2.4.2 Unsupervised Machine Learning

As the name suggests, it is opposite to supervised ML methods or algorithms which means in


unsupervised machine learning algorithms we do not have any supervisor to provide any sort of
guidance. Unsupervised learning algorithms are handy in the scenario in which we do not have
the liberty, like in supervised learning algorithms, of having pre-labeled training data and we
want to extract useful pattern from input data. Examples of unsupervised machine learning
algorithms includes K-means clustering, K-nearest neighbors etc.

17
Fig. 2.7

Unsupervised Machine Learning mainly includes Clustering algorithms. Some of the more
popular algorithms in this category are:

1. K Means Clustering Algorithm

Fig. 2.8

18
Let’s imagine that you want to search the name “Harry” on Wikipedia. Now, “Harry” can
refer to Harry Potter, Prince Harry of England, or any other popular Harry on Wikipedia! So
Wikipedia groups the web pages that talk about the same ideas using the K Means Clustering
Algorithm (since it is a popular algorithm for cluster analysis). K Means Clustering Algorithm
in general uses K number of clusters to operate on a given data set. In this manner, the output
contains K clusters with the input data partitioned among the clusters.

2. Apriori Algorithm

Fig. 2.9

The Apriori Algorithm uses the if-then format to create association rules. This means that if a
certain event 1 occurs, then there is a high probability that a certain event 2 also occurs. For
example: IF someone buys a car, THEN there is a high chance they buy car insurance as well.
The Apriori Algorithm generates this association rule by observing the number of people who
bought car insurance after buying a car. For example, Google auto-complete uses the Apriori
Algorithm. When a word is typed in Google, the Apriori Algorithm looks for the associated
words that are usually typed after that word and displays the possibilities.

2.5 Artificial Neural Networks

Artificial Neural Networks are modeled after the neurons in the human brain. They contain
artificial neurons which are called units. These units are arranged in a series of layers that
together constitute the whole Artificial Neural Networks in a system. A layer can have only a
19
dozen units or millions of units as this depends on the complexity of the system. Commonly,
Artificial Neural Networks have an input layer, output layer as well as hidden layers. The
input layer receives data from the outside world which the neural network needs to analyze or
learn about.

Fig. 2.10

Then this data passes through one or multiple hidden layers that transform the input into data
that is valuable for the output layer. Finally, the output layer provides an output in the form of
a response of the Artificial Neural Networks to input data provided.
In the majority of neural networks, units are interconnected from one layer to another. Each of
these connections has weights that determine the influence of one unit on another unit. As the
data transfers from one unit to another, the neural network learns more and more about the
data which eventually results in an output from the output layer.

2.6 Applications of Machines Learning

Machine Learning is the most rapidly growing technology and according to researchers we are in
the golden year of AI and ML. It is used to solve many real-world complex problems which
cannot be solved with traditional approach. Following are some real-world applications of ML

 Emotion analysis
 Sentiment analysis
 Error detection and prevention
20
 Weather forecasting and prediction
 Stock market analysis and forecasting
 Speech synthesis
 Speech recognition
 Customer segmentation
 Object recognition
 Fraud detection
 Fraud prevention
 Recommendation of products to customer in online shopping.

21
CHAPTER-3
FEASIBILITY STUDY

3.1 Introduction to Feasibility Study

Feasibility is the determination of whether or not a project is worth doing the process followed
making this determination is called feasibility study. This of course determines if a project can
and should be taken. Once it has been determined that a project is feasible, the analyst can go
ahead and prepare the project specification which finalizes project requirements. Generally,
feasibility studies are undertaken within right time constraints and normally culminate in a written
and oral feasibility report. The contents and recommendations of such a study will be used as a
sound basis for deciding whether to proceed, postpone or cancel the project. Thus, since the
feasibility study may lead to the commitment of large resources, it becomes necessary that it
should be conducted competently and that no fundamental errors of judgment are made.

There are following types of inter related feasibility. They are:

• Technical feasibility
• economic feasibility
• operational feasibility

3.2 Technical Feasibility

This is concerned with specifying equipment and software and hardware that will successfully
satisfy the user requirement. The technical needs off the system may vary considerably, but might
include:

• The facility to produce output in a given time.


• Response time under certain conditions.
• Ability to process a certain volume of transaction at a particular speed.
• Facility to communicate data to distant location.

In examining technical feasibility, configuration of the system is given more importance than the
actual make of hardware. The configuration should give the complete picture about the system
requirements. What speeds of input and output should be achieved at particular quality of
printing.
22
According to the definition of technical feasibility the compatibility between front-end and back-
end is very important. In our project the compatibility of both is very good. The degree of
compatibility of JSP and SQL Server 2000 is very good. The speed of output is very good when
we enter the data and click button then the response time is very fast and give result very I never
find difficulty when we use complex query or heavy transaction. The speed of transaction is
always smooth and constant. This software provides facility to communicate data to distant
location.

We use Java Server Pages and Java Script. The designing of front-end of any project is very
important so we select Java Server Pages, Java Script as front-end due to following reasons:

• Easy implementation of code.


• Well define interface with database.
• Well define hand shaking of SQL Server 2000.
• Easy debugging.

At present scenario the no of backend are available but I have selected SQL Server 2000 because
of the following number of reasons:

• Able to handle large data.


• Security.
• Robust RDBMS.
• Backup & recovery.

With the help of above support we remove defect of existing software. In future we can easily
switch over any plate form. To ensure that system does not halt in case of undesired situation or
events. Problem effected of any module does not effect any module of the system. A change of
hardware does not produce problem.

3.3 Operational Feasibility

It is mainly related to human organizational and political aspects. The points to be considered are:

• What changes will be brought with the system? What organization structures are
disturbed?
• What new skills will be required?
• Do the existing staff members have these skills?
• If not, can they be trained in due course of time?

23
At present stage all the work is done manually. So, throughput and response time is too much.
Major problem is lack of security check that should have been applied. Finding out the detail
regarding customer's transaction was very difficult, because data store was in different books and
different places. In case of any problem, no one can solve the problem until the person responsible
is not present. Current communication is entirely on telephonic conversation or personal
meetings. Post computerization, staff can interact using internet. Now, we will explain the last
point of operational feasibility i.e. handling and keeping of software, at every point of designing
I will take care that menu options are not too complex and can be easily learned and required least
amount of technical skills as operators are going to be from non computers back ground.

3.4 Economic Feasibility

Economic analysis is the most frequently used technique for evaluating the effectiveness of a
proposed system. More commonly known as cost/benefit analysis: the procedure is to determine
the benefits and saving that are expected from a proposed system and compare them with cost.
If benefits outweigh cost, a decision is taken to design and implement the system. Otherwise,
further justification or alternative in the proposed system will have to be made if it is to have a
chance of being approved. This is an ongoing effort that improves in accuracy at each phase of
the system life cycle.

3.5 Operating Environment

Hardware and software requirements

• Hardware Requirements
• Software Requirements

Hardware used:

• Processors - 11th Gen Intel(R) Core (TM) i5-1135G7


• Speed - 3.00 GHZ
• RAM - 8GB
• Storage - 20 GB

Software used:

• Operating System - XP/7/8/8.1/10/11


• IDE used - Jupyter Notebook/Google Colab

3.6 Methodology of Work

24
Languages/IDEs used for these Products:

• Coding Language - Python


• IDE used - Jupyter Notebook/Google Colab
• Operating System - XP/7/8/8.1/10/11

25
Chapter – 4
Literature Survey

4.1 An Introduction to Python


Python is a popular object-oriented programing language having the capabilities of high-level
programming language. Its easy to learn syntax and portability capability makes it popular these
days. The followings facts gives us the introduction to Python −
• Python was developed by Guido van Rossum at Stichting Mathematisch Centrum in the
Netherlands.
• It was written as the successor of programming language named ‘ABC’.
• It’s first version was released in 1991.
• The name Python was picked by Guido van Rossum from a TV show named Monty
Python’s Flying Circus.
• It is an open source programming language which means that we can freely download it
and use it to develop programs. It can be downloaded from www.python.org.
• Python programming language is having the features of Java and C both. It is having the
elegant ‘C’ code and on the other hand, it is having classes and objects like Java for object-
oriented programming.
• It is an interpreted language, which means the source code of Python program would be
first converted into bytecode and then executed by Python virtual machine.

4.2 Python for Data Science


Python is the fifth most important language as well as most popular language for Machine
learning and data science. The following are the features of Python that makes it the preferred
choice of language for data science –
• Extensive set of packages
Python has an extensive and powerful set of packages which are ready to be used in various
domains. It also has packages like numpy, scipy, pandas, scikit-learn etc. which are
required for machine learning and data science.
• Easy prototyping
Another important feature of Python that makes it the choice of language for data science
is the easy and fast prototyping. This feature is useful for developing new algorithm.
• Collaboration feature
The field of data science basically needs good collaboration and Python provides many
useful tools that make this extremely.

26
• One language for many domains
A typical data science project includes various domains like data extraction, data
manipulation, data analysis, feature extraction, modelling, evaluation, deployment and
updating the solution. As Python is a multi-purpose language, it allows the data scientist
to address all these domains from a common platform.
4.3 Built-in Data Types in Python
• In programming, data type is an important concept.
• Variables can store data of different types, and different types can do different things.
• Python has the following data types built-in by default, in these categories:

Text Type: str

Numeric Types: int, float, complex

Sequence Types: list, tuple, range

Mapping Type: dict

Set Types: set, frozenset

Boolean Type: bool

Binary Types: bytes, bytearray, memoryview

None Type: NoneType

Table No. 4.1

4.4 List
• Lists are used to store multiple items in a single variable.
• Lists are one of 4 built-in data types in Python used to store collections of data, the other
3 are Tuple, Set, and Dictionary, all with different qualities and usage.
• Lists are created using square brackets.
27
• Example:

list1 = [1 , 2, 'abc', 3, 'def']

list2 = []

list3 = list((1,2,3))

print(list1)

# Output: [1, 2, 'abc', 3, 'def']

print(list2)

# Output: [ ]

print(list3)

# Output: [1, 2, 3]

4.5 Tuple

• Tuples are similar to lists. This collection also has iterable, ordered, and (can contain)
repetitive data, just like lists.
• But unlike lists, tuples are immutable.
• Example:

tuple1=(1,2,'abc', 3, 4)

tuple2=()

tuple3=tuple((1,3,5,"hello"))

print(tuple1)

# Output: (1, 2, 'abc', 3, 4)

print(tuple2)

# Output: ()

print(tuple3)

# Output: (1, 2, 3, 'hello')

4.6 Set
28
• Set is another data structure that holds a collection of unordered, iterable and mutable
data.
• It only contains unique elements.
• Example:

set1={1,2,3,'abc', 6}

print(set1)

# Output: set([1, 2, 3, 'abc', 6])

4.7 Dictionary

• Unlike all other collection types, dictionaries strictly contain key-value pairs.
• Example:

dict1={"key1":"value1","key2":"value2"}

dict2={}

dict3=dict({1:"one",2:"two",3:"three"})

print(dict1)

# Output: {'key2': 'value2', 'key1': 'value1'}

print(dict2)

# Output: {}

print(dict3)

# Output: {1: 'one', 2: 'two', 3: 'three'}

4.8 Python Libraries for Machine Learning :

i. NumPy
It is another useful component that makes Python as one of the favorite languages for Data
Science. It basically stands for Numerical Python and consists of multidimensional array objects.
By using NumPy, we can perform the following important operations −
 Mathematical and logical operations on arrays.
29
 Fourier transformation
 Operations associated with linear algebra.
We can also see NumPy as the replacement of MatLab because NumPy is mostly used along
with Scipy (Scientific Python) and Mat-plotlib (plotting library).

Example:

import numpy as np

arr = np.array([1, 2, 3, 4, 5])

print(arr)

print(type(arr))

Output:

[1 2 3 4 5]

<class 'numpy.ndarray'>

ii. Pandas
It is another useful Python library that makes Python one of the favorite languages for Data
Science. Pandas is basically used for data manipulation, wrangling and analysis. It was developed
by Wes McKinney in 2008. With the help of Pandas, in data processing we can accomplish the
following five steps −
 Load
 Prepare
 Manipulate
 Model
 Analyze

Example:

import pandas as pd
df = pd.read_csv('data.csv')
print(df.to_string())

30
Output:

Duration Pulse Maxpulse Calories

0 60 110 130 409.1


1 60 117 145 479.0
2 60 103 135 340.0
3 45 109 175 282.4
4 45 117 148 406.0
5 60 102 127 300.5
6 60 110 136 374.0
7 45 104 134 253.3
8 30 109 133 195.1
9 60 98 124 269.0
10 60 103 147 329.3
11 60 100 120 250.7
12 60 106 128 345.3
13 60 104 132 379.3
14 60 98 123 275.0
15 60 98 120 215.2
16 60 100 120 300.0
17 45 90 112 NaN
18 60 103 123 323.0
19 45 97 125 243.0
20 60 108 131 364.2
21 45 100 119 282.0
22 60 130 101 300.0
23 45 105 132 246.0
24 60 102 126 334.5
25 60 100 120 250.0

iii. Scikit-learn
Another useful and most important python library for Data Science and machine learning in
Python is Scikit-learn. The following are some features of Scikit-learn that makes it so useful −
 It is built on NumPy, SciPy, and Matplotlib.
 It is an open source and can be reused under BSD license.
 It is accessible to everybody and can be reused in various contexts.
 Wide range of machine learning algorithms covering major areas of ML like
classification, clustering, regression, dimensionality reduction, model selection etc.
can be implemented with the help of it.
 Example :

from sklearn import datasets

31
from sklearn.tree import DecisionTreeClassifier

from sklearn.model_selection import KFold, cross_val_score

X, y = datasets.load_iris(return_X_y=True)

clf = DecisionTreeClassifier(random_state=42)

k_folds = KFold(n_splits = 5)

scores = cross_val_score(clf, X, y, cv = k_folds)

print("Cross Validation Scores: ", scores)

print("Average CV Score: ", scores.mean())

print("Number of CV Scores used in Average: ", len(scores))

Output:

Cross Validation Scores: [1. 1. 0.83333333 0.93333333 0.8]

Average CV Score: 0.9133333333333333

Number of CV Scores used in Average: 5

iv. Matplotlib

• Matplotlib is a low level graph plotting library in python that serves as a visualization
utility.
• Matplotlib was created by John D. Hunter.
• Matplotlib is open source and we can use it freely.
• Matplotlib is mostly written in python, a few segments are written in C, Objective-C and
Javascript for Platform compatibility.
• Example:

import matplotlib.pyplot as plt

import numpy as np

xpoints = np.array([0, 6])

ypoints = np.array([0, 250])


32
plt.plot(xpoints, ypoints)

plt.show()

Output:

Fig. 4.1

33
Chapter -5
Project Work
5.1 Introduction to the project

• In this project we use the dataset of IPL from season 2008 to 2019.

• After applying Machine Learning methods using Python.

• IPL dataset is taken to analyze the metrics of different teams in IPL. Libraries such
as pandas, matplotlib, and seaborn are used to perform exploratory data analysis on top
of this IPL data.

• Finally, some machine learning algorithms are implemented to predict which team
has a better likelihood of winning the tournament.

5.2 Working (Flow chart/ screenshots)

Fig. 5.1: Proposed system architecture

34
Fig. 5.2: Flow diagram

35
5.3 Coding

36
37
38
39
40
41
42
43
5.4 Result and Discussion:

Random Forest is observed to be the best accurate classifier with 89.15% to predict the best
player performance.

Fig. 5.3: Result

5.5 Future scope of project

This knowledge will be used in future to predict the winning teams for the next series IPL
matches. Hence using this prediction, the best team can be formed. This project opens scope for
future work in the field of cricket and predicting other important things like best team of players,
best venue, best city, best fielding decision to win a match.

44
Conclusion and Bibliography

Conclusion:

Selection of the best team for a cricket match plays a significant role for the team’s victory. The
main goal of this paper is to analyse the IPL cricket data and predict the players’ performance.
Here, three classification algorithms are used and compared to find the best accurate algorithm.
The implementation tools used are Anaconda navigator and Jupyter. Random Forest is observed
to be the best accurate classifier with 89.15% to predict the best player performance

References:

 T. A. Severini, Analytic methods in sports: Using mathematics and statistics to understand


data from baseball, football, basketball, and other sports. Chapman and Hall/CRC, 2014.
 H. Ghasemzadeh and R. Jafari, “Coordination analysis of human movements with body
sensor networks: A signal processing model to evaluate baseball swings,” IEEE Sensors
Journal, vol. 11, no. 3, pp. 603–610, 2010
 R. Rein and D. Memmert, “Big data and tactical analysis in elite soccer: future challenges
and opportunities for sports science,” SpringerPlus, vol. 5, no. 1, p. 1410, 2016
 Veppur Sankaranarayanan, Vignesh and Sattar, Junaed and Lakshmanan,”Auto-play: A
Data Mining Approach to ODI Cricket
 Simulation and Prediction”,SIAM Conference on Data Mining, 2014 K. A. A. D. Raj and
P. Padma, ”Application of Association Rule
 Mining: A case study on team India”, 2013 International Conference on Computer
Communication and Informatics, 2013
 Tim B. SWARTZ, Paramjit S Gill and S. Muthukumarana,”Modelling and simulation for
one-day cricket”, Canadian Journal of Statistics, 2009, Vol 37, No 2, pp-143-16

45
Appendix

Popular machine-learning algorithms:

Requires
Linear/non
Name Type Use normaliz
linear
ation

Linear Regression Model a scalar target with one or more quantitative features. Linear Yes

regression Although regression computes a linear combination, features

can be transformed by nonlinear functions if relationships

are known or can be guessed.

R: www.inside-r.org/r-doc/stats/lm

Python: http://scikit-

learn.org/stable/modules/generated/sklearn.linear_model.Lin

earRegression.html#sklearn.linear_model.LinearRegression

Logistic Classification Categorize observations based on quantitative features; Linear Yes

regression predict target class or probabilities of target classes.

R: www.statmethods.net/advstats/glm.html Python: http://sci

kit-

learn.org/stable/modules/generated/sklearn.linear_model.Log

isticRegression.html

SVM Classification Classification based on separation in high-dimensional Linear Yes

/regression space. Predicts target classes. Target class probabilities

require additional computation. Regression uses a subset of

the data, and performance is highly data dependent.

R: https://cran.r-

project.org/web/packages/e1071/vignettes/svmdoc.pdf

46
Requires
Linear/non
Name Type Use normaliz
linear
ation

Python: http://scikit-learn.org/stable/modules/svm.html

SVM with Classification SVM with support for a variety of nonlinear models. Nonlinear Yes

kernel /regression R: https://cran.r-

project.org/web/packages/e1071/vignettes/svmdoc.pdf Pytho

n: http://scikit-learn.org/stable/modules/svm.html

K-nearest Classification Targets are computed based on those of the training set that Nonlinear Yes

neighbors /regression are “nearest” to the test examples via a distance formula (for

example, Euclidean distance). For classification, training

targets “vote.” For regression, they are averaged. Predictions

are based on a “local” subset of the data, but are highly

accurate for some datasets. R: https://cran.r-

project.org/web/packages/class/class.pdf Python: http://scikit

learn.org/stable/modules/generated/sklearn.neighbors.KNeig

hborsClassifier.html

Decision Classification Training data is recursively split into subsets based on Nonlinear No

trees /regression attribute value tests, and decision trees that predict targets

are derived. Produces understandable models, but random

forest and boosting algorithms nearly always produce lower

error rates.

R: www.statmethods.net/advstats/cart.html Python: http://sci

kit-learn.org/stable/modules/tree.html#tree

Random Classification An “ensemble” of decision trees is used to produce a Nonlinear No

forest /regression stronger prediction than a single decision tree. For

47
Requires
Linear/non
Name Type Use normaliz
linear
ation

classification, multiple decision trees “vote.” For regression,

their results are averaged. R: https://cran.r-

project.org/web/packages/randomForest/randomForest.pdf P

ython: http://scikit-

learn.org/stable/modules/generated/sklearn.ensemble.Rando

mForestClassifier.html

Boosting Classification For multitree methods, boosting algorithms reduce Nonlinear No

/regression generalization error by adjusting weights to give greater

weight to examples that are misclassified or (for regression)

those with larger residuals. R: https://cran.r-

project.org/web/packages/gbm/gbm.pdf https://cran.r-

project.org/web/packages/adabag/adabag.pdf Python: http://s

cikit-

learn.org/stable/modules/generated/sklearn.ensemble.Gradie

ntBoostingClassifier.html

Naïve Classification A simple, scalable classification algorithm used especially in Nonlinear Yes

Bayes text classification tasks (for example, spam-classification). It

assumes independence between features (hence, naïve),

which is rarely the case, but the algorithm works surprisingly

well in specific cases. It utilizes the Bayes theorem, but is

not “Bayesian” as used in the field of statistics.

R: https://cran.r-

project.org/web/packages/e1071/ Python: http://scikit-

learn.org/stable/modules/classes.html#module-

sklearn.naive_bayes

48
Requires
Linear/non
Name Type Use normaliz
linear
ation

Neural Classification Used to estimate unknown functions that are based on a large Nonlinear Yes

network /regression number of inputs, through the back-propagation algorithm.

Generally more complex and computationally expensive

than other methods, but powerful for certain problems. The

basis of many deep learning methods. R: https://cran.r-

project.org/web/packages/neuralnet/neuralnet.pdf https://cran

.r-

project.org/web/packages/nnet/nnet.pdf Python: http://scikit-

learn.org/dev/modules/neural_networks_supervised.html http

://deeplearning.net/software/theano/

Vowpal Classification An online ML program developed by John Langford at

Wabbit /Regression Yahoo Research, now Microsoft. It incorporates various

algorithms, including ordinary least squares and single-layer

neural nets. As an online ML program, it doesn’t require all

data to fit in memory. It’s known for fast processing of large

datasets. Vowpal Wabbit has a unique input format and is

generally run from a command line rather than through

APIs. https://github.com/JohnLangford/vowpal_wabbit/wiki

XGBoost Classification A highly optimized and scalable version of the boosted

/Regression decision trees

algorithm. https://xgboost.readthedocs.org/en/latest/

Table No. 7.1

49

You might also like