Professional Documents
Culture Documents
net/publication/337623019
CITATIONS READS
0 570
2 authors, including:
Anees Ahmed
PAF Karachi Institute of Economics & Technology
2 PUBLICATIONS 0 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Anees Ahmed on 29 November 2019.
B. Word Embedding
This is a dense vector representation of word and
documentation. This approach is enhancing form of old bag-of-
Table1: Co-Occurrence Matrix
Table1; clearly shows that the dimensionality of each word F. Neural Probablastic Model
will increase linearly as we increase the size of corpus. As the Neural Probabilistic language models are traditionally trained
size increase we face the problem of sparseness, suppose we using the Maximum Likelihood principle to maximize the
had million of words so we have to create millions x millions probability of the next word wt (for “target”) given the
matrix and span of zero are increased. The big drawback is a previous words h (for “history”) in terms of a softmax
lot of memory wastage. Word2Vector model optimize this function.
approach and provides the most optimal way for representing
words.
D. Vector Space Model
We predict that the context word c j is the one with the largest
probability p(cj | w i ).
Since softmax is a monotonic (order-preserving) function, we
actually maximize the dot product wi T · c j
Eq (3)
Reference
[1] Yann LeCun, Koray Kavukcuoglu, and Clément Farabet. 2010.
Convolutional networks and applications in vision. In Circuits and
Systems (ISCAS), Proceedings of 2010 IEEE International Symposium
on IEEE, pages 253–256.
[2] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012.
Imagenet classification with deep convolutional neural networks. In
Advances in neural information processing systems, pages 1097–1105.
[3] Lilleberg, Joseph, Yun Zhu, and Yanqing Zhang. "Support vector
Fig5: Cosine Similarity of words
machines and word2vec for text classification with semantic
features." Cognitive Informatics & Cognitive Computing (ICCI* CC),
2015 IEEE 14th International Conference on. IEEE, 2015.
IV. COMPARATIVE ANALYSIS AND DISCUSSION
[4] Alex Graves, Abdel-rahman Mohamed, and Geoffrey Hinton. Speech
Some simple factors can be used to measure the performance recognition with deep recurrent neural networks. Acoustics, speech and
signal processing (ICASSP), 2013 IEEE International Conference on.
of skip gram neural network. IEEE, 2013
Equation 1 shows the output function for the 1st hidden layer is [5] Tutek,M,(2017), “Word Embeddings and NeuralNetworks for Natural
the product of the input vector V of each center word Cj over Language Processing” [Avaialble online at]
the product of input vector V of each context work Cl and https://www.fer.unizg.hr/_download/repository/TAR-07-WENN.pdf
application of softmax over the product. [Accessed on] 05 December 2018.
Setting up the cost value 1 for each arithmetic operation we [6] Skymind (n.d.). “A Beginner's Quide to Word2Vec and Neural Word
Embeddings ”[Avaialble online at] < https://skymind.ai/wiki/word2vec
can get the value for each neuron. So the operation performed > [Accessed on] 06 December 2018.
between each center and context words is 2V times for each [7] Brownlee Jason(2017),“ How to Use Word Embedding Layers for Deep
softmax operation. Learning with Keras” [Avaialble online at] <
https://machinelearningmastery.com/use-word-embedding-layers-deep-
learning-keras/> [Accessed on] 06 December 2018.
For example, for 1000 input words there will be:
N = V * Cj + softmax(V*C) + (V * C) + softmax(V * C)
N = 1000 * 1000 + (1000) + ( 1000 * 1000) + 1000
N = (1012 +2 x 103 ) for single nuron
N = 2 x 1016 for 10 nurons per layrs
N = 2 x 1000 58 for 1 input, 2 hidden and 1 output layrs
V. CONCLUSION