The Birth of BERT

Running head: [TITLE] 1
The Birth of BERT and its Innovative Impacts to Natural Language Processing
[Author]
[Institution]
Author Note
[Grant/funding info and complete correspondence address.]

[TITLE] 2
Abstract
The Bidirectional Encoder Representations from Transformers which is commonly known as
BERT is considered a game changer in the field of Natural Language Processing (NLP). It is
often described as the future of NLP and can be used to further enhance its capacity and range.
However, BERT is only known to those who are in the field of computers and to those who are
updated in emerging technologies. Hence, the goal of this report paper is to shed light as to how
BERT can revolutionize and innovate the traditional natural language processing. It will also
analyze the economic and social impacts of BERT in a society that is already technology
dependent. Furthermore, it will trace its origin and its direction in the near future. The scope of
this paper is limited to the basic understanding of BERT and will not tackle further advance
topics about it. It will serve as the tip of an iceberg and its main purpose is to incite the readers
for further research about BERT.

[TITLE] 3
Introduction
The NLP (Natural Language Processing) is complex technology that is basically used to aid
computers in their quest to further understand the human language. Its main objective is to deal
with the interaction between human and machine which range from reading and understanding,
to deciphering and making sense of the human language so that it can be used effectively. NLP is
not really a new technology but thanks to its rapid growth recently, it is now considered as an
emerging technology. Its massive popularity nowadays can be linked to the birth of BERT.
BERT, which stands for Bidirectional Encoder Representations from Transformers is a machine
learning framework for natural language processing. It is based on Transformers, a deep learning
model in which every output element is connected to every input element, and the weightings
between them are dynamically calculated based upon their connection. BERT is also unique
compared to the old NLPs because of the following reasons: First, it is bidirectional and uses an
unsupervised language representation. And second, it is completely an open source NLP which
allows anyone who is well versed in machine learning to freely build their own model and
modify it without relying on massive datasets in model training which can save time and
resources. Its Pre-training also includes a massive range of unlabeled texts on the internet which
includes a book corpus of 800 million words and the website of Wikipedia which boasts a total
of 2,500 million words. Because of its discovery, traditional NLPs are becoming obsolete and
can be discarded since BERT is always ahead of them in terms of accuracy and speed. It is now
being considered as a game changer that can potentially mark a new era in natural language
processing.
[TITLE] 4
The Birth of BERT
The pioneer in NLP breakthrough came in 2013 when Mikolov published a paper of Word2Vec
which is the first modern technique in NLP. This was swiftly followed by GloVe and other word
representation models. After that, it was followed by another types of NLP that enables the
processing of textual sequence data. These are the Recurrent Neural Networks (RNN) and the
Long Short-Term Memory Networks (LSTM). These two are considered good enough at that
time but the rapid development in our technology will need and even better NLP model.
The transformers which is considered as the main aspect of BERT was first introduced in 2017
by Google. However, the language models at that time are using something called RNN or
recurrent neural networks and the CNN which stands for convolutional neural networks. RNN
and CNN are already competent enough, but they are not reliable since they require the
processing of the data sequences in a fixed order unlike the transformers. This became the basis
in creating BERT which can be trained with an enormous amount of language.
The year 2018 is when google introduced an open sourced BERT that achieved a stunning result
in eleven languages. It shows a huge development in particularly in interpretation and
classification of emotions (Sentiment Analysis). It also completes tasks rapidly in the field of
semantic role labelling and sentence classification. With these innovations that BERT showed, it
completely outclassed the previous language models like word2vec and GloVe which relies in
interpreting the context of a certain sentences and polysemous words. By October 2019, Google
said that they will start to incorporate BERT in the production search algorithms of the United
States. And on December 2019, BERT was already applied to more than 70 different languages.
[TITLE] 5
How BERT works
NLP or Natural Language Processing is a field of computer science that has one main goal: it is
for computers to have better understanding on the human language and to comprehend it in any
possible way. BERT’s work is basically predicting words or sentences in a blank using a large
repository of specialized training data. It relies on the transformer which is an accurate
mechanism that predicts contextual relationships between words and sentences. Since BERT was
pre-trained in an environment with plain text corpus like the Wikipedia, it never stops learning
even if it is left unchecked. Hence, it continues to improve and adapt the contents and queries of
the users and modify it to individual specifications. This method is called transfer learning which
is the application of the knowledges gathered in completing a certain task. The transformers play
a huge role in the increased capacity of BERT for understanding context and ambiguity of a
certain word. It does this by processing and correlating any selected word to the other words
inside the sentence. Hence, it allows the BERT model to fully predict and understand the full
context of the chosen word and understand the user’s intent on that word. This is different from
the old models like the Word2vec and GloVe since these two would only map a single word in a
vector which is only one dimensional. BERT is the only natural language processing model that
relies on what it called, self-attention tool thanks to the transformers. This is highly relevant
because a certain word may change its meaning depending on its use in the sentence. Thus, the
more words put inside a sentence, the more it became heavily clouded and hard to predict. BERT
solves this problem by reading bidirectionally and considering all the other words inside the
sentence before eliminating the left and right momentum that gave certain words a bias as the
sentence develops. This made BERT the only bidirectional model in all of NLP.
[TITLE] 6
BERT and its impact to our society
Natural Language Processing gave a whole new definition in the field of computers and artificial
intelligence. It is without doubt, an emerging technology that can have a huge role in the
advancement of our modern society. And since BERT is considered as the best NLP model right
now, it will become the major player in its field. As of now, BERT is currently used in Google to
optimize its searching capabilities. However, it is expected to have a huge impact on voice
search. The voice search that we know right now is not that accurate so only few people use it.
But with BERT and its accuracy, voice search can become something that is a big part of our
daily lives. It will make things easier for us, especially in the long run.
Another thing that BERT can change is the field of mental health analysis. NLP models can be
used in predicting various changes in a patient words simply by predicting its meaning. It can
measure several linguistic features in the conversations which can be analyzed for treatment.
Hence, we can have a new way of translating various subtle signs by the patient which allows for
more room in identifying what kind of mental health is present.
Overall, NLP and its most advance model called BERT will make human-computer interaction
faster and easier. This in turn will have a huge impact especially in our society which is already
technology dependent. It can control the pace of our progress will have the power to manipulate
the information delivered to the common people over the internet. If used in a decent way, it will
greatly help, especially in education. But if it is used in the wrong way like leaking unverified
information, it can become a wall that will hinder our progress and will chaos in our society.
[TITLE] 7
After all, words and information can be used to either help the people or control them
unconsciously.
Conclusion
BERT is a powerful natural language processing model that greatly enhance the capacity of
computer to interact with humans. Because of it, the number of organizations that take interest in
NLP rise into a significant number. But as of now, BERT can only be used in a limited class of
problem. This means that BERT is currently a developing technology that needs further research
and experimentation. Its true potential can only be realized when the model is adopted into
various operations in real time. Hence, only time will tell if BERT will continue to improve and
dominate the NLP field. Or if it will be replaced by a more advance competitor.
References
https://towardsml.com/2019/09/17/bert-explained-a-complete-guide-with-theory-and-tutorial/
https://searchenterpriseai.techtarget.com/definition/BERT-language-model
https://www.analyticsvidhya.com/blog/2019/09/demystifying-bert-groundbreaking-nlp-
framework/
https://becominghuman.ai/a-simple-introduction-to-natural-language-processing-ea66a1747b32

The Birth of BERT

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

The Birth of BERT

Uploaded by

Copyright:

Available Formats

Running head: [TITLE] 1

[Grant/funding info and complete correspondence address.]

The Bidirectional Encoder Representations from Transformers which is commonly known as

for further research about BERT.

The Birth of BERT

in creating BERT which can be trained with an enormous amount of language.

in eleven languages. It shows a huge development in particularly in interpretation and

How BERT works

repository of specialized training data. It relies on the transformer which is an accurate

BERT and its impact to our society

more room in identifying what kind of mental health is present.

dominate the NLP field. Or if it will be replaced by a more advance competitor.

You might also like