## Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

Chapter 18, Sections 1–3

Chapter 18, Sections 1–3

1

Outline

♦ Learning agents ♦ Inductive learning ♦ Decision tree learning ♦ Measuring learning performance

Chapter 18, Sections 1–3

2

Learning

Learning is essential for unknown environments, i.e., when designer lacks omniscience Learning is useful as a system construction method, i.e., expose the agent to reality rather than trying to write it down Learning modiﬁes the agent’s decision mechanisms to improve performance

Chapter 18, Sections 1–3

3

Learning agents

Performance standard

Critic

Sensors

feedback changes

Learning element learning goals Problem generator

Environment

knowledge

Performance element

experiments

Agent

Effectors

Chapter 18, Sections 1–3

4

Learning element

Design of learning element is dictated by ♦ what type of performance element is used ♦ which functional component is to be learned ♦ how that functional compoent is represented ♦ what kind of feedback is available Example scenarios:

Performance element Alpha−beta search Logical agent Utility−based agent Simple reflex agent Component Eval. fn. Transition model Transition model Percept−action fn Representation Weighted linear function Successor−state axioms Dynamic Bayes net Neural net Feedback Win/loss Outcome Outcome Correct action

**Supervised learning: correct answers for each instance Reinforcement learning: occasional rewards
**

Chapter 18, Sections 1–3 5

**Inductive learning (a.k.a. Science)
**

Simplest form: learn a function from examples (tabula rasa) f is the target function O O X X An example is a pair x, f (x), e.g., , +1 X Problem: ﬁnd a(n) hypothesis h such that h ≈ f given a training set of examples (This is a highly simpliﬁed model of real learning: – Ignores prior knowledge – Assumes a deterministic, observable “environment” – Assumes examples are given – Assumes that the agent wants to learn f —why?)

Chapter 18, Sections 1–3

6

**Inductive learning method
**

Construct/adjust h to agree with f on training set (h is consistent if it agrees with f on all examples) E.g., curve ﬁtting:

f(x)

x

Chapter 18, Sections 1–3

7

**Inductive learning method
**

Construct/adjust h to agree with f on training set (h is consistent if it agrees with f on all examples) E.g., curve ﬁtting:

f(x)

x

Chapter 18, Sections 1–3

8

**Inductive learning method
**

Construct/adjust h to agree with f on training set (h is consistent if it agrees with f on all examples) E.g., curve ﬁtting:

f(x)

x

Chapter 18, Sections 1–3

9

**Inductive learning method
**

Construct/adjust h to agree with f on training set (h is consistent if it agrees with f on all examples) E.g., curve ﬁtting:

f(x)

x

Chapter 18, Sections 1–3

10

**Inductive learning method
**

Construct/adjust h to agree with f on training set (h is consistent if it agrees with f on all examples) E.g., curve ﬁtting:

f(x)

x

Chapter 18, Sections 1–3

11

**Inductive learning method
**

Construct/adjust h to agree with f on training set (h is consistent if it agrees with f on all examples) E.g., curve ﬁtting:

f(x)

x

**Ockham’s razor: maximize a combination of consistency and simplicity
**

Chapter 18, Sections 1–3 12

**Attribute-based representations
**

Examples described by attribute values (Boolean, discrete, continuous, etc.) E.g., situations where I will/won’t wait for a table:

Example X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 Attributes Target Alt Bar F ri Hun P at P rice Rain Res T ype Est WillWait T F F T Some $$$ F T French 0–10 T T F F T Full $ F F Thai 30–60 F F T F F Some $ F F Burger 0–10 T T F T T Full $ F F Thai 10–30 T T F T F Full $$$ F T French >60 F F T F T Some $$ T T Italian 0–10 T F T F F None $ T F Burger 0–10 F F F F T Some $$ T T Thai 0–10 T F T T F Full $ T F Burger >60 F T T T T Full $$$ F T Italian 10–30 F F F F F None $ F F Thai 0–10 F T T T T Full $ F F Burger 30–60 T

**Classiﬁcation of examples is positive (T) or negative (F)
**

Chapter 18, Sections 1–3 13

Decision trees

One possible representation for hypotheses E.g., here is the “true” tree for deciding whether to wait:

Patrons?

None

F

Some

T

Full

WaitEstimate?

>60

F

30−60

Alternate?

10−30

Hungry?

0−10

T

No

Reservation?

Yes

Fri/Sat?

No

T

Yes

Alternate?

No

Bar?

Yes

T

No

F

Yes

T

No

T

Yes

Raining?

No

F

Yes

T

No

F

Yes

T

Chapter 18, Sections 1–3

14

Expressiveness

Decision trees can express any function of the input attributes. E.g., for Boolean functions, truth table row → path to leaf:

A F F T T B F T F T A xor B F T T F

B A

F

T

B

F

F

T

T

F

T

T

F

Trivially, there is a consistent decision tree for any training set w/ one path to leaf for each example (unless f nondeterministic in x) but it probably won’t generalize to new examples Prefer to ﬁnd more compact decision trees

Chapter 18, Sections 1–3

15

Hypothesis spaces

How many distinct decision trees with n Boolean attributes??

Chapter 18, Sections 1–3

16

Hypothesis spaces

How many distinct decision trees with n Boolean attributes?? = number of Boolean functions

Chapter 18, Sections 1–3

17

Hypothesis spaces

How many distinct decision trees with n Boolean attributes?? = number of Boolean functions = number of distinct truth tables with 2n rows

Chapter 18, Sections 1–3

18

Hypothesis spaces

How many distinct decision trees with n Boolean attributes?? = number of Boolean functions n = number of distinct truth tables with 2n rows = 22

Chapter 18, Sections 1–3

19

Hypothesis spaces

How many distinct decision trees with n Boolean attributes?? = number of Boolean functions n = number of distinct truth tables with 2n rows = 22 E.g., with 6 Boolean attributes, there are 18,446,744,073,709,551,616 trees

Chapter 18, Sections 1–3

20

Hypothesis spaces

How many distinct decision trees with n Boolean attributes?? = number of Boolean functions n = number of distinct truth tables with 2n rows = 22 E.g., with 6 Boolean attributes, there are 18,446,744,073,709,551,616 trees How many purely conjunctive hypotheses (e.g., Hungry ∧ ¬Rain)??

Chapter 18, Sections 1–3

21

Hypothesis spaces

How many distinct decision trees with n Boolean attributes?? = number of Boolean functions n = number of distinct truth tables with 2n rows = 22 E.g., with 6 Boolean attributes, there are 18,446,744,073,709,551,616 trees How many purely conjunctive hypotheses (e.g., Hungry ∧ ¬Rain)?? Each attribute can be in (positive), in (negative), or out ⇒ 3n distinct conjunctive hypotheses More expressive hypothesis space – increases chance that target function can be expressed – increases number of hypotheses consistent w/ training set ⇒ may get worse predictions

Chapter 18, Sections 1–3

22

**Decision tree learning
**

Aim: ﬁnd a small tree consistent with the training examples Idea: (recursively) choose “most signiﬁcant” attribute as root of (sub)tree

function DTL(examples, attributes, default) returns a decision tree if examples is empty then return default else if all examples have the same classiﬁcation then return the classiﬁcation else if attributes is empty then return Mode(examples) else best ← Choose-Attribute(attributes, examples) tree ← a new decision tree with root test best for each value vi of best do examplesi ← {elements of examples with best = vi } subtree ← DTL(examplesi, attributes − best, Mode(examples)) add a branch to tree with label vi and subtree subtree return tree

Chapter 18, Sections 1–3

23

Choosing an attribute

Idea: a good attribute splits the examples into subsets that are (ideally) “all positive” or “all negative”

Patrons?

None Some Full French Italian

Type?

Thai Burger

P atrons? is a better choice—gives information about the classiﬁcation

Chapter 18, Sections 1–3

24

Information

Information answers questions The more clueless I am about the answer initially, the more information is contained in the answer Scale: 1 bit = answer to Boolean question with prior 0.5, 0.5 Information in an answer when prior is P1, . . . , Pn is H( P1, . . . , Pn ) = Σi = 1 − Pi log2 Pi (also called entropy of the prior)

n

Chapter 18, Sections 1–3

25

Information contd.

Suppose we have p positive and n negative examples at the root ⇒ H( p/(p+n), n/(p+n) ) bits needed to classify a new example E.g., for 12 restaurant examples, p = n = 6 so we need 1 bit An attribute splits the examples E into subsets Ei, each of which (we hope) needs less information to complete the classiﬁcation Let Ei have pi positive and ni negative examples ⇒ H( pi/(pi +ni), ni/(pi +ni) ) bits needed to classify a new example ⇒ expected number of bits per example over all branches is pi + ni H( pi/(pi + ni), ni/(pi + ni) ) Σi p+n For P atrons?, this is 0.459 bits, for T ype this is (still) 1 bit ⇒ choose the attribute that minimizes the remaining information needed

Chapter 18, Sections 1–3

26

Example contd.

Decision tree learned from the 12 examples:

Patrons?

None

F

Some

T

Full

Hungry?

Yes

Type?

No

F

French

T

Italian

F

Thai

Fri/Sat?

Burger

T

No

F

Yes

T

**Substantially simpler than “true” tree—a more complex hypothesis isn’t justiﬁed by small amount of data
**

Chapter 18, Sections 1–3 27

Performance measurement

How do we know that h ≈ f ? (Hume’s Problem of Induction) 1) Use theorems of computational/statistical learning theory 2) Try h on a new test set of examples (use same distribution over example space as training set) Learning curve = % correct on test set as a function of training set size

1 % correct on test set 0.9 0.8 0.7 0.6 0.5 0.4 0 10 20 30 40 50 60 70 80 90 100 Training set size

Chapter 18, Sections 1–3 28

**Performance measurement contd.
**

Learning curve depends on – realizable (can express target function) vs. non-realizable non-realizability can be due to missing attributes or restricted hypothesis class (e.g., thresholded linear function) – redundant expressiveness (e.g., loads of irrelevant attributes)

% correct 1 realizable redundant nonrealizable

# of examples

Chapter 18, Sections 1–3

29

Summary

Learning needed for unknown environments, lazy designers Learning agent = performance element + learning element Learning method depends on type of performance element, available feedback, type of component to be improved, and its representation For supervised learning, the aim is to ﬁnd a simple hypothesis that is approximately consistent with training examples Decision tree learning using information gain Learning performance = prediction accuracy measured on test set

Chapter 18, Sections 1–3

30

- 6.1. Classification Tutorial
- Business Intelligence & Data Mining-6-7
- Why Functional Programming Matters
- MCA_Eng
- Applying Classification Technique using DID3 Algorithm to improve Decision Support System under Uncertain Situations
- A Review of Data Mining Techniques
- whyfp
- Character Based Method -Unit 1 CMB
- 15 Hellerstein Boolean
- Privacy Preserving Data Mining in Distributed System using RDT Framework
- A Novel Privacy Preserving Supervised Learning Approach in Data mining
- Classn_439.pdf
- cts cquestp
- Tree
- Forms_ How To Create a Hierachical Tree form - Craig's Oracle Stuff.pdf
- Chap6 Basic Association Analysis (1)
- The Binary Plan
- 7_mjahrer_23
- splay Tree
- 27.a Study.full
- Technical Interview Questions and Answers
- Lec8
- Technical Interview Questions and Answers
- CS3110 Prelim 2 SP09
- New B.Edu Syllabus India
- lisd interaction rubric
- CEESP Project Managment. Unit Four
- EDU 5170 Lesson Plan.
- teaching
- Binary Search Tree

- The Future of Thinking
- Hawaii DOE
- tmp6F07.tmp
- Quest to Learn
- UTLA_CCTP_Phase_1_iPad_Survey_Results_Nov2013.pdf
- tmp2C6B.tmp
- Modern Teaching as Islamic Tradition
- Digital Textbook Playbook
- Opening Up Education
- The Future of Learning Institutions in a Digital Age
- Learning and Generations, Southfield Center for Development
- Pakistan Professional Teacher Guide
- PART 1 of 6 - Unit 1
- Developing Professional Learning in K-based Economy and K-Society using Action Learning and Action Research Approaches
- New Digital Media and Learning as an Emerging Area and "Worked Examples" as One Way Forward
- Pb Options 8 21stcent
- UT Dallas Syllabus for psy3339.001 06s taught by Karen Huxtable-jester (kxh014900)
- UT Dallas Syllabus for cs6375.001.09s taught by Yang Liu (yxl053200)
- UT Dallas Syllabus for drdg0v94.002.07s taught by Mary Adams (mkadams)
- UT Dallas Syllabus for drdg0v94.002.08s taught by Mary Adams (mkadams)
- UT Dallas Syllabus for atec4340.001 06f taught by Marjorie Zielke (maz031000)
- States Need to Fill in the Gaps on Expanded Learning Time
- UT Dallas Syllabus for psy4394.021.07u taught by Bert Moore (bmoore)
- pascal_article_wyndham_shanti_wong_2015_march.pdf
- Lifelong Learning in 2040
- UT Dallas Syllabus for rhet1101.044.07f taught by Thomas Mackenzie (tam036000)
- UT Dallas Syllabus for cldp3339.001.11s taught by Karen Huxtable-Jester (kxh014900)
- UT Dallas Syllabus for rhet1101.081.10u taught by Sally Zirkle (szirkle)
- tmp1104.tmp
- Observation Note-Taking and Summary Form_ Jan. to Mar

- Robotics (31)
- Robotics (33)
- Robotics (38)
- Robotics (26)
- Robotics (27)
- Robotics (34)
- Robotics (28)
- Robotics (40)
- Robotics (29)
- Robotics (37)
- Robotics (32)
- Robotics (36)
- Robotics (25)
- Robotics (39)
- Robotics (30)
- Robotics (18)
- Robotics (22)
- Robotics (13)
- Robotics (15)
- Robotics (19)
- Robotics (23)
- Robotics (17)
- Robotics (16)
- Robotics (11)
- Robotics (20)
- Robotics (12)
- Robotics (24)
- Robotics (14)
- Robotics (21)

Sign up to vote on this title

UsefulNot usefulClose Dialog## Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

Close Dialog## This title now requires a credit

Use one of your book credits to continue reading from where you left off, or restart the preview.

Loading