You are on page 1of 33

Neural Network

Language Modeling
Table of Contents
•Introduction to neural networks
•Activation Functions
•Types of neural networks
•Learning in neural networks
•Applications of neural networks
•Advantages of neural networks
•Disadvantages of neural networks
•Introduction to Language modeling
•Neural Networks Language Modeling
Neural Networks
Computing based on interaction of multiple connected processing elements
Powerful in doing many processes
Ability to adapt and learn
Ability to deal with incomplete data
Basics of neural networks
Developed from biological approaches
Developed in 1943
Has one or more layers
Has different types of feeding: including feedforward and feedback networks
A mathematical model of biological neuron
Neural network neurons
Receives input
Multiplies input by weight
applies activation function to the sum of results
Output results
Neural network advantages
Neural networks can perform better than normal linear programs
As neural networks are parallel they can continue without failing
Can be implemented in variety of applications
Ability to derive meaning from complicated or imprecise data
Neural networks need training data
Requires high processing time
Activation function
Control the activity of the unit
Threshold function outputs 1 when it is active and 0 when it is inactive
Some examples of activation functions:
Sigmoid = 1 / (1 + e-x)
Tanh = 2/(1+e-2x)-1 (-1, 1)
Types of neural networks
Connection type:
◦ Static
◦ Dynamic

◦ Single layer
◦ Multiple layers
◦ Recurrent

Learning method:
◦ Supervised
◦ Unsupervised
◦ Reinforcement
Like feedforward networks
Feedback networks; which their connection changes when the network is going backwards
Single layer Multilayer Recurrent
Learning methods
Supervised learning
◦ Each learning pattern: input + desired output
◦ At each presentation: adapts weights
◦ After many epochs convergence to a local minimum
Unsupervised learning
No help from outside
Learning by doing
Pick out structures in the input:
◦ Clustering
◦ Dimensionality reduction
Reinforcement learning
Inspired by behaviorist psychologists
Teacher: Training data
The teacher scores the performance of the training examples
Use performance to shuffle weights randomly
Relatively slow in learning due to randomness
Examples: Robotic tasks
Neural network applications
Pattern recognition
Investments analysis
Control systems and monitoring
Mobile computing
Natural language processing
Forecasting sales, market, meteorology
Language modeling
Filtering out bad sentences
Model the sentences via probability distribution over sequences of words:
Assign a probability to a given sentence:
S1 = “The cat jumped over the dog”, Pr(S1) ~1
S2 = “The Over cat dog jumped”, Pr(S2) ~0
Language modeling applications
Machine translation:
◦ P(high winds tonight) > P(large winds tonight)

Spell correction:
◦ The office is about fifteen minuets from my house
◦ P(about fifteen minutes from) > P(about fifteen minuets from)

Speech Recognition:
◦ P(I saw a van) > P(eyes awe of an)

Summarization, question-answering, etc.

N-gram models

A sentence s = (x1, x2, ... , xT)

How likely is s?
p(x1, x2, ... , xT)
N-gram models
Data sparsity
Lack of generalization:
[ride a horse], [ride a llama]
[ride a zebra]
One-hot encoding
V = {zebra, horse, school, summer}
V(zebra) = [1,0,0,0]
V(Horse) = [0,1,0,0]
V(school) = [0,0,1,0]
V(summer) = [0,1,0,1]
One hot encoding is simple and yet word similarity is undefined
Distributional Representation

Stars: shining, bright, light, dark

Stars: cucumber, computer, lava, Harvard
Distributional Representation
It is simple
It has the notion of word similarity
But it can be computationally and memory-wise inefficient
Distributional Representation
V is a vocabulary
wi ∈ V
v(wi) ∈ Rn
v(wi) is a low-dimensional, learnable, dense word vector
Distributed Representation
Deep learning + language modeling
Traditionally uses architecture such as Recurrent neural networks
Advancements In neural networks: LSTM
LSTM is a recurrent network that can be part of an eventually bigger recurrent network.
LSTM contains: Cell, input gate, output gate, forget gate
The Cell is the memory, the three other gates are conventional artificial neurons

it= g(wxixt+whiht-1)
Neural Network Language Model
Neural Network Language Model
Convolutional networks
CNNs are breakthrough in image processing; however they can used in language modeling as
Convolutional networks
The input of convolutional neural nets for language modelng could either be word2vec to one-
hot embedding.
Based on:
Jozefowicz, Rafal, et al. "Exploring the limits of language modeling." arXiv preprint
arXiv:1602.02410 (2016).

Best performance of LSTM in language modeling yields 23 in perplexity and RNN resulted 41 in

You might also like