Welcome to Scribd!

Data Science Q&A - Latest Ed (2020) - 8 - 3

Uploaded by

0% found this document useful (0 votes)

4 views2 pages

The Gini Impurity (GI) metric measures how homogeneous a set of items are, with a value of 0 indicating complete homogeneity and a value close to 1 indicating maximum heterogeneity. In an example dataset of 12 items split between apples, grapes, and lemons, a distribution of 0-0-12 would yield a GI of 0, while a distribution of 4-4-4 would yield a maximum GI of 0.667. Information gain is an alternative metric that measures how well a data split or question reduces impurity at a node in a decision tree, calculated as the difference between pre-split and post-split impurity values weighted by subset sizes.

Original Description:

Original Title

Data Science Q&A -- Latest Ed (2020)_8_3

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

4 views2 pages

Data Science Q&A - Latest Ed (2020) - 8 - 3

Uploaded by

M K

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 2

Search inside document

The Gini Impurity (GI) metric measures the homogeneity of a set of items.

The lowest possible value of GI

is 0.0. The maximum value of GI depends on the particular problem being investigated but gets close to
1.0.

Suppose for example you have 12 items — apples, grapes, lemons. If there are 0 apples, 0 grapes, 12 lemons,
then you have minimal impurity (this is good for decision trees) and GI = 0.0. But if you have 4 apples, 4
grapes, 4 lemons, you have maximum impurity and it turns out that GI = 0.667.
I’ll show example calculations.
Maximum GI: Apples, Grapes, Lemons

When the number of items is evenly distributed, as in the example above, you have maximum GI but the
exact value depends on how many items there are. A bit less than maximum GI:

In the example above, the items are not quite evenly distributed, and the GI is slightly less (which is better
when used for decision trees). Minimum GI:
The Gini index is not at all the same as a different metric called the Gini coefficient. The Gini impurity
metric can be used when creating a decision tree but there are alternatives, including Entropy
Information gain. The advantage of GI is its simplicity.

Information Gain
Information gain is another metric which tells us how much a question unmixes the labels at a
node. “Mathematically it is just a difference between impurity values before splitting the data at a node
and the weighted average of the impurity after the split”. For instance, if we go back to our data of
apples, lemons and grapes and ask the question “Is the color Green?”

The information gain by asking this question is 0.144. Similarly, we can ask another question from the set
of possible questions split the data and compute information gain. This is also called (Recursive Binary
Splitting).

The Sins of Green Washing 2010 - Terrachoice
Document31 pages
The Sins of Green Washing 2010 - Terrachoice
Andy Sternberg
No ratings yet
Greenwash Guide
Document44 pages
Greenwash Guide
Futerra Sustainability Communications
No ratings yet
Practical Research 2 Q2 Module 6 7
Document32 pages
Practical Research 2 Q2 Module 6 7
Rosita Antiquina Elopre
100% (3)
Utility Assignment
Document8 pages
Utility Assignment
AshikHasan
No ratings yet
Decision Tree
Document12 pages
Decision Tree
shrikrishna
No ratings yet
Unit-3: Decision Trees and Random Forests
Document65 pages
Unit-3: Decision Trees and Random Forests
Sneha GD
No ratings yet
Decision Tree Tutorial
Document8 pages
Decision Tree Tutorial
Дхиа Еддине
No ratings yet
Gini V Entropy
Document12 pages
Gini V Entropy
program program
No ratings yet
Final Stats Project
Document8 pages
Final Stats Project
api-238717717
100% (2)
Final Project
Document3 pages
Final Project
api-265661554
No ratings yet
Skittles Term Project Math 1040
Document6 pages
Skittles Term Project Math 1040
api-316781222
No ratings yet
Gini Index
Document15 pages
Gini Index
Shreyas Varadkar
No ratings yet
Research Methodology: Target Population
Document6 pages
Research Methodology: Target Population
shiva
No ratings yet
Dtree&rf
Document26 pages
Dtree&rf
Mohit Soni
No ratings yet
Decision Tree & Random Forest
Document16 pages
Decision Tree & Random Forest
reshma acharya
No ratings yet
Decision Tree
Document3 pages
Decision Tree
64-Shashank Sagar
No ratings yet
The Sins of Greenwashing: Home and Family Edition
Document31 pages
The Sins of Greenwashing: Home and Family Edition
Siddharth Kaushal
No ratings yet
Decision Tree
Document7 pages
Decision Tree
Sreshta Tric
No ratings yet
Trade and Efficiency: Getting A Good Deal or Making A Good Deal
Document3 pages
Trade and Efficiency: Getting A Good Deal or Making A Good Deal
Carla Mae F. Dadural
No ratings yet
Predictably Irrational Chapter 3 - The Cost of Zero Cost - Whistling in The Wind
Document10 pages
Predictably Irrational Chapter 3 - The Cost of Zero Cost - Whistling in The Wind
Perry Lau
No ratings yet
Eportfolio Statistics
Document8 pages
Eportfolio Statistics
api-325734103
No ratings yet
Easy to Be Green: A Guide for Small Enterprises
From Everand
Easy to Be Green: A Guide for Small Enterprises
Jan Triplett Ph.D.
No ratings yet
Gini Impurity Explained
Document1 page
Gini Impurity Explained
minion
No ratings yet
Skittles Project
Document17 pages
Skittles Project
api-359548770
No ratings yet
Term Project - Eportfolio
Document7 pages
Term Project - Eportfolio
api-251981709
No ratings yet
Going Green or Growing Greed:: A Content-Analysis of Green Advertising in Popular Magazines
Document14 pages
Going Green or Growing Greed:: A Content-Analysis of Green Advertising in Popular Magazines
Ethan
No ratings yet
Math 1040 Skittles Term Project Aynoa
Document15 pages
Math 1040 Skittles Term Project Aynoa
api-301347675
No ratings yet
How Much Do We Pay For Things
Document13 pages
How Much Do We Pay For Things
Taufeek Irawan
No ratings yet
1.decision Trees Concepts
Document70 pages
1.decision Trees Concepts
Suyash Jain
No ratings yet
Term Project Part 3 Summary Stats - Individual Portion
Document1 page
Term Project Part 3 Summary Stats - Individual Portion
api-284921427
No ratings yet
Green Marketing: Research Methodlogy Project P
Document31 pages
Green Marketing: Research Methodlogy Project P
Manar Adel
No ratings yet
Fundamental of Counting Principles
Document8 pages
Fundamental of Counting Principles
Katherine Joy Cruz
No ratings yet
DS4 - CLS-Decision Tree
Document32 pages
DS4 - CLS-Decision Tree
Văn Hoàng Trần
No ratings yet
Materials:: Edmodo Student Sign On
Document9 pages
Materials:: Edmodo Student Sign On
api-283874327
No ratings yet
Case Study in Marketing
Document4 pages
Case Study in Marketing
jay dave
No ratings yet
DMDW 05
Document12 pages
DMDW 05
Harsh Nag
No ratings yet
Mainstream Green Exec Summary
Document4 pages
Mainstream Green Exec Summary
Kartik Mehta
No ratings yet
3.decision Tree
Document23 pages
3.decision Tree
anima tor
No ratings yet
OMAC Data Analyst
Document91 pages
OMAC Data Analyst
Ahmed Elbaz
No ratings yet
Measurement and Descriptive Stats
Document26 pages
Measurement and Descriptive Stats
Anika O'Connell
No ratings yet
The Million Dollar Case Study 1
Document78 pages
The Million Dollar Case Study 1
SergiuFuior
No ratings yet
Skittles Term Project Stats
Document8 pages
Skittles Term Project Stats
api-302549779
No ratings yet
Quantitative Vs Qualitative Data - What's The Difference?
Document4 pages
Quantitative Vs Qualitative Data - What's The Difference?
Aadil
No ratings yet
Skittles
Document7 pages
Skittles
api-244615173
No ratings yet
7 Criteria For Choosing Indicators
Document4 pages
7 Criteria For Choosing Indicators
Dagnachew Asrat
No ratings yet
Green Marketing - How To Tap Into A $3.5 Trillion Market
Document4 pages
Green Marketing - How To Tap Into A $3.5 Trillion Market
Eric Britton (World Streets)
No ratings yet
Lesson 1 Notes
Document2 pages
Lesson 1 Notes
Wendy Ke
No ratings yet
Sciteso Capstone
Document2 pages
Sciteso Capstone
Jeruz Ryuken ABU
No ratings yet
Gross Domestic Product
Document3 pages
Gross Domestic Product
Mohammad Adnan
No ratings yet
Regression - Elements of AI 4-2
Document20 pages
Regression - Elements of AI 4-2
Mubasher Hussain
100% (1)
Decision Tree Algorithm - A Complete Guide: Data Science Blogathon
Document13 pages
Decision Tree Algorithm - A Complete Guide: Data Science Blogathon
sumanroyal
No ratings yet
Skittles Final
Document12 pages
Skittles Final
api-241861434
No ratings yet
EVIDENCE ENVIRONMENTAL ISSUES - Miguel Ángel Gutiérrez Rengifo
Document8 pages
EVIDENCE ENVIRONMENTAL ISSUES - Miguel Ángel Gutiérrez Rengifo
Miguel Ángel
No ratings yet
Sumif(s), Countif(s), Averageif(s) - Solved
Document5 pages
Sumif(s), Countif(s), Averageif(s) - Solved
anzar
No ratings yet
The Rainbow Report
Document11 pages
The Rainbow Report
api-316991888
No ratings yet
Budget Line
Document3 pages
Budget Line
aytennuraiyevaa
No ratings yet
Math 1040
Document5 pages
Math 1040
api-534403229
No ratings yet
Eportfolio
Document8 pages
Eportfolio
api-242211470
No ratings yet
Reading Assignment Chapter 78
Document16 pages
Reading Assignment Chapter 78
Henry Hnry
No ratings yet
Going Green or Going Broke
From Everand
Going Green or Going Broke
Edward Ford
No ratings yet
The Grammar of Graphics - 1
Document1 page
The Grammar of Graphics - 1
M K
No ratings yet
Document1597167106566 9
Document1 page
Document1597167106566 9
M K
No ratings yet
Document1597167106566 8
Document1 page
Document1597167106566 8
M K
No ratings yet
Document1597167106566 7
Document1 page
Document1597167106566 7
M K
No ratings yet
Document1597167106566 6
Document1 page
Document1597167106566 6
M K
No ratings yet
Document1597167106566 1
Document1 page
Document1597167106566 1
M K
No ratings yet
Data Science Q&A - Latest Ed (2020) - 7 - 1
Document2 pages
Data Science Q&A - Latest Ed (2020) - 7 - 1
M K
No ratings yet
Kudumba Jothidam Astrology Ebook PDF Free - 9
Document1 page
Kudumba Jothidam Astrology Ebook PDF Free - 9
M K
No ratings yet
Data Science Q&A - Latest Ed (2020) - 6 - 1
Document2 pages
Data Science Q&A - Latest Ed (2020) - 6 - 1
M K
No ratings yet
F 1
Document1 page
F 1
M K
No ratings yet
Data Science Q&A - Latest Ed (2020) - 3 - 2
Document2 pages
Data Science Q&A - Latest Ed (2020) - 3 - 2
M K
No ratings yet
Data Science Q&A - Latest Ed (2020) - 3 - 1
Document2 pages
Data Science Q&A - Latest Ed (2020) - 3 - 1
M K
No ratings yet