Conditional Random Fields

Conditional Random Fields - A probabilistic graphical model
Yen-Chin Lee
Outline

Labeling sequence data problem Introduction conditional random field (CRF) Different views on building a conditional random field (CRF) From directed to undirected graphical models From generative to discriminative models Sequence models From HMMs to CRFs Difference between MEMM & CRFs Parameter estimation / inference Experiment
Labeling Sequence Data
X is a random variable over data sequence Y is a random variable over label sequence Yi is assumed to range over a finite label set A The problem:
Learn how to give labels y from the label set to a data sequence x
x1
x2
is verb
x3 being noun y3
X:
Thinking noun
Applications
Computational biology
Computational linguistics Information extraction
Y:
y1
y2
Conditional Random Fields
A form of discriminative model
Has been used successfully in various domains such as part of speech tagging and other Natural Language Processing tasks exp( f (x, y ) + g (x, y , y ))
i i t j j t t 1
P( y | x) =
Z(x)
Undirected acyclic graph Allow some transitions vote more strongly than others depending on the corresponding observations
Motivation
Bayesian Network
Naive Bayes
Logistic Regression
Hidden Markov Model
Linear Chain Conditional Random Field
General Conditional Random Field
Markov Random Field
Directed vs. Undirected Models

P(Y|X)
Directed models

Using conditional prob. for each local substructure Called Bayesian network
(X,Y)
Undirected models

Using potential functions in each local substructure Called Markov random field or Markov network
generative discriminative models

Bayesian Network
Naive Bayes
Logistic Regression
Hidden Markov Model
Markov Random Field
generative v.s discriminative models

generative Nave Bayes discriminative Logistic regression
Base on a model of Base on a model of Joint distribution P(y,x) conditional distribution P(y|x) Need to calculate P(X) Dont need
Overview: sequence models

Bayesian Network
Naive Bayes
Logistic Regression
Hidden Markov Model
Markov Random Field
Sequence models: HMMs
Power of graphical models: model many interdependent variables HMM models joint distribution

Uses two independence assumptions to do it tractably
Given the direct predecessor, each state is independent of his ancestors Each observation depends only on current state
From HMMs to linear chain CRFs (1)
Key: conditional distribution p(y|x) of an HMM is a CRF with a particular choice of feature function with
ij = log p( y ' = i | y = j )
From HMMs to linear chain CRFs (2)
last step: write conditional probability P(y|x) for the HMM .Then a linear-chain
conditional random field is a distribution p(y|x) that takes the form
Maximum Entropy Markov Models (MEMMs)

A conditional model that representing the probability of reaching a state given an observation and the previous state Consider observation sequences to be events to be conditioned upon. Given training set X with label sequence Y:

Train a model that maximizes P(Y|X, ) For a new data sequence x, the predicted label y maximizes P(y|x, ) Notice the per-state normalization
MEMMs (contd)
MEMMs have all the advantages of Conditional Models Per-state normalization: all the mass that arrives at a state must be distributed among the possible successor states Subject to Label Bias Problem
Bias toward states with fewer outgoing transitions
Label Bias Problem

Consider this MEMM:
P(1 and 2 | ro) = P(2 | 1 and ro)P(1 | ro) = P(2 | 1 and o)P(1 | r) P(1 and 2 | ri) = P(2 | 1 and ri)P(1 | ri) = P(2 | 1 and i)P(1 | r)
Since P(2 | 1 and x) = 1 for all x, P(1 and 2 | ro) = P(1 and 2 | ri) In the training data, label value 2 is the only label value observed after label value 1 Therefore P(2 | 1) = 1, so P(2 | 1 and x) = 1 for all x However, we expect P(1 and 2 | ri) to be greater than P(1 and 2 | ro). Per-state normalization does not allow the required expectation
Solve the Label Bias Problem
Change the state-transition structure of the model
Not always practical to change the set of states
Principles in parameter estimation
basic principle: maximum likelihood estimation with conditional log likelihood of

l( ) = log p( y ( i ) | x ( i ) )
i= 1
advantage: conditional log likelihood is concave, therefore every local optimum is a global one
Differentiating the log-likelihood function with respect to parameters j gives
Principles in parameter estimation
There is no analytical solutions for the parameter by maximizing the log-likelihood Setting the gradient to zero and solving for does not always yield a closed form solution Iterative technique is adopted Iterative scaling Gradient decent
use gradient descent: quasi-Newton methods runtime in O(TM2NG) T: length of sequence M: number of labels N: number of training instances G: number of required gradient computations
Summary of Structures relation
Experiment
Modeling the label bias problem
A run consists of 2,000 training examples and 500 test examples, trained to convergence using Iterative Scaling algorithm CRF error is 4.6%, and MEMM error is 42% MEMM fails to discriminate between the two branches CRF solves label bias problem
MEMM vs. HMM

The HMM outperforms the MEMM
MEMM vs. CRF

CRF usually outperforms the MEMM
Summary
Discriminative models are prone to the label bias problem CRFs provide the benefits of discriminative models CRFs solve the label bias problem well, and demonstrate good performance
Thanks!!

Conditional Random Fields - A probabilistic graphical model: Yen-Chin Lee 指導老師：鮑興國

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Conditional Random Fields - A probabilistic graphical model: Yen-Chin Lee 指導老師：鮑興國

Uploaded by

Copyright:

Available Formats

Conditional Random Fields - A probabilistic graphical model

Labeling Sequence Data

A form of discriminative model

Hidden Markov Model

Linear Chain Conditional Random Field

General Conditional Random Field

Markov Random Field

Directed vs. Undirected Models

generative discriminative models

Hidden Markov Model

Linear Chain Conditional Random Field

General Conditional Random Field

Markov Random Field

generative v.s discriminative models

Overview: sequence models

Hidden Markov Model

Linear Chain Conditional Random Field

General Conditional Random Field

Markov Random Field

Sequence models: HMMs

Uses two independence assumptions to do it tractably

From HMMs to linear chain CRFs (1)

From HMMs to linear chain CRFs (2)

conditional random field is a distribution p(y|x) that takes the form

Maximum Entropy Markov Models (MEMMs)

Bias toward states with fewer outgoing transitions

Label Bias Problem

Solve the Label Bias Problem

Change the state-transition structure of the model

Not always practical to change the set of states

Principles in parameter estimation

basic principle: maximum likelihood estimation with conditional log likelihood of

Differentiating the log-likelihood function with respect to parameters j gives

Principles in parameter estimation

Summary of Structures relation

Modeling the label bias problem

MEMM vs. HMM

MEMM vs. CRF

You might also like