You are on page 1of 7

Running head: [TITLE] 1

The Birth of BERT and its Innovative Impacts to Natural Language Processing

[Author]

[Institution]

Author Note

[Grant/funding info and complete correspondence address.]


[TITLE] 2

Abstract

The Bidirectional Encoder Representations from Transformers which is commonly known as

BERT is considered a game changer in the field of Natural Language Processing (NLP). It is

often described as the future of NLP and can be used to further enhance its capacity and range.

However, BERT is only known to those who are in the field of computers and to those who are

updated in emerging technologies. Hence, the goal of this report paper is to shed light as to how

BERT can revolutionize and innovate the traditional natural language processing. It will also

analyze the economic and social impacts of BERT in a society that is already technology

dependent. Furthermore, it will trace its origin and its direction in the near future. The scope of

this paper is limited to the basic understanding of BERT and will not tackle further advance

topics about it. It will serve as the tip of an iceberg and its main purpose is to incite the readers

for further research about BERT.


[TITLE] 3

Introduction

The NLP (Natural Language Processing) is complex technology that is basically used to aid

computers in their quest to further understand the human language. Its main objective is to deal

with the interaction between human and machine which range from reading and understanding,

to deciphering and making sense of the human language so that it can be used effectively. NLP is

not really a new technology but thanks to its rapid growth recently, it is now considered as an

emerging technology. Its massive popularity nowadays can be linked to the birth of BERT.

BERT, which stands for Bidirectional Encoder Representations from Transformers is a machine

learning framework for natural language processing. It is based on Transformers, a deep learning

model in which every output element is connected to every input element, and the weightings

between them are dynamically calculated based upon their connection. BERT is also unique

compared to the old NLPs because of the following reasons: First, it is bidirectional and uses an

unsupervised language representation. And second, it is completely an open source NLP which

allows anyone who is well versed in machine learning to freely build their own model and

modify it without relying on massive datasets in model training which can save time and

resources. Its Pre-training also includes a massive range of unlabeled texts on the internet which

includes a book corpus of 800 million words and the website of Wikipedia which boasts a total

of 2,500 million words. Because of its discovery, traditional NLPs are becoming obsolete and

can be discarded since BERT is always ahead of them in terms of accuracy and speed. It is now

being considered as a game changer that can potentially mark a new era in natural language

processing.
[TITLE] 4

The Birth of BERT

The pioneer in NLP breakthrough came in 2013 when Mikolov published a paper of Word2Vec

which is the first modern technique in NLP. This was swiftly followed by GloVe and other word

representation models. After that, it was followed by another types of NLP that enables the

processing of textual sequence data. These are the Recurrent Neural Networks (RNN) and the

Long Short-Term Memory Networks (LSTM). These two are considered good enough at that

time but the rapid development in our technology will need and even better NLP model.

The transformers which is considered as the main aspect of BERT was first introduced in 2017

by Google. However, the language models at that time are using something called RNN or

recurrent neural networks and the CNN which stands for convolutional neural networks. RNN

and CNN are already competent enough, but they are not reliable since they require the

processing of the data sequences in a fixed order unlike the transformers. This became the basis

in creating BERT which can be trained with an enormous amount of language.

The year 2018 is when google introduced an open sourced BERT that achieved a stunning result

in eleven languages. It shows a huge development in particularly in interpretation and

classification of emotions (Sentiment Analysis). It also completes tasks rapidly in the field of

semantic role labelling and sentence classification. With these innovations that BERT showed, it

completely outclassed the previous language models like word2vec and GloVe which relies in

interpreting the context of a certain sentences and polysemous words. By October 2019, Google

said that they will start to incorporate BERT in the production search algorithms of the United

States. And on December 2019, BERT was already applied to more than 70 different languages.
[TITLE] 5

How BERT works

NLP or Natural Language Processing is a field of computer science that has one main goal: it is

for computers to have better understanding on the human language and to comprehend it in any

possible way. BERT’s work is basically predicting words or sentences in a blank using a large

repository of specialized training data. It relies on the transformer which is an accurate

mechanism that predicts contextual relationships between words and sentences. Since BERT was

pre-trained in an environment with plain text corpus like the Wikipedia, it never stops learning

even if it is left unchecked. Hence, it continues to improve and adapt the contents and queries of

the users and modify it to individual specifications. This method is called transfer learning which

is the application of the knowledges gathered in completing a certain task. The transformers play

a huge role in the increased capacity of BERT for understanding context and ambiguity of a

certain word. It does this by processing and correlating any selected word to the other words

inside the sentence. Hence, it allows the BERT model to fully predict and understand the full

context of the chosen word and understand the user’s intent on that word. This is different from

the old models like the Word2vec and GloVe since these two would only map a single word in a

vector which is only one dimensional. BERT is the only natural language processing model that

relies on what it called, self-attention tool thanks to the transformers. This is highly relevant

because a certain word may change its meaning depending on its use in the sentence. Thus, the

more words put inside a sentence, the more it became heavily clouded and hard to predict. BERT

solves this problem by reading bidirectionally and considering all the other words inside the

sentence before eliminating the left and right momentum that gave certain words a bias as the

sentence develops. This made BERT the only bidirectional model in all of NLP.
[TITLE] 6

BERT and its impact to our society

Natural Language Processing gave a whole new definition in the field of computers and artificial

intelligence. It is without doubt, an emerging technology that can have a huge role in the

advancement of our modern society. And since BERT is considered as the best NLP model right

now, it will become the major player in its field. As of now, BERT is currently used in Google to

optimize its searching capabilities. However, it is expected to have a huge impact on voice

search. The voice search that we know right now is not that accurate so only few people use it.

But with BERT and its accuracy, voice search can become something that is a big part of our

daily lives. It will make things easier for us, especially in the long run.

Another thing that BERT can change is the field of mental health analysis. NLP models can be

used in predicting various changes in a patient words simply by predicting its meaning. It can

measure several linguistic features in the conversations which can be analyzed for treatment.

Hence, we can have a new way of translating various subtle signs by the patient which allows for

more room in identifying what kind of mental health is present.

Overall, NLP and its most advance model called BERT will make human-computer interaction

faster and easier. This in turn will have a huge impact especially in our society which is already

technology dependent. It can control the pace of our progress will have the power to manipulate

the information delivered to the common people over the internet. If used in a decent way, it will

greatly help, especially in education. But if it is used in the wrong way like leaking unverified

information, it can become a wall that will hinder our progress and will chaos in our society.
[TITLE] 7

After all, words and information can be used to either help the people or control them

unconsciously.

Conclusion

BERT is a powerful natural language processing model that greatly enhance the capacity of

computer to interact with humans. Because of it, the number of organizations that take interest in

NLP rise into a significant number. But as of now, BERT can only be used in a limited class of

problem. This means that BERT is currently a developing technology that needs further research

and experimentation. Its true potential can only be realized when the model is adopted into

various operations in real time. Hence, only time will tell if BERT will continue to improve and

dominate the NLP field. Or if it will be replaced by a more advance competitor.

References

https://towardsml.com/2019/09/17/bert-explained-a-complete-guide-with-theory-and-tutorial/

https://searchenterpriseai.techtarget.com/definition/BERT-language-model

https://www.analyticsvidhya.com/blog/2019/09/demystifying-bert-groundbreaking-nlp-

framework/

https://becominghuman.ai/a-simple-introduction-to-natural-language-processing-ea66a1747b32

You might also like