You are on page 1of 5

ELLY MUNGAI

BSCIT-01-0387/2019

INTRODUCTION TO ARTIFICIAL INTELLIGENCE


SECTION B
1. a) Using a well labeled diagram discuss the steps undertaken in natural language
processing
Step 1: Sentence Segmentation
The first step in the pipeline is to break the text apart into separate sentences. That gives
us this:
“London is the capital and most populous city of England and the United Kingdom.”
Step 2: Word Tokenization
Now that we’ve split our document into sentences, we can process them one at a time. Let’s start
with the first sentence from our document:

“London is the capital and most populous city of England and the United Kingdom.”

The next step in our pipeline is to break this sentence into separate words or tokens. This is
called tokenization. This is the result:

“London”, “is”, “ the”, “capital”, “and”, “most”, “populous”, “city”, “of”, “England”, “and”,
“the”, “United”, “Kingdom”, “.”

Tokenization is easy to do in English. We’ll just split apart words whenever there’s a space
between them. And we’ll also treat punctuation marks as separate tokens since punctuation also
has meaning.

Step 3: Predicting Parts of Speech for Each Token

Next, we’ll look at each token and try to guess it’s part of speech whether it is a noun, a verb, an
adjective and so on. Knowing the role of each word in the sentence will help us start to figure out
what the sentence is talking about.

Step 4: Text Lemmatization


In English (and most languages), words appear in different forms. Look at these two sentences:

I had a pony.

I had two ponies.

Both sentences talk about the noun pony, but they are using different inflections. When working
with text in a computer, it is helpful to know the base form of each word so that you know that
both sentences are talking about the same concept. Otherwise the strings “pony” and “ponies”
look like two totally different words to a computer.

Step 5: Identifying Stop Words

Next, we want to consider the importance of each word in the sentence. English has a lot of filler
words that appear very frequently like “and”, “the”, and “a”. When doing statistics on text, these
words introduce a lot of noise since they appear way more frequently than other words. Some
NLP pipelines will flag them as stop words that is, words that you might want to filter out before
doing any statistical analysis.

Stop words are usually identified by just by checking a hardcoded list of known stop words. But
there’s no standard list of stop words that is appropriate for all applications. The list of words to
ignore can vary depending on your application.

Step 6: Dependency Parsing

The next step is to figure out how all the words in our sentence relate to each other. This is called
dependency parsing.

The goal is to build a tree that assigns a single parent word to each word in the sentence. The
root of the tree will be the main verb in the sentence.

Finding Noun Phrases

So far, we’ve treated every word in our sentence as a separate entity. But sometimes it makes
more sense to group together the words that represent a single idea or thing. We can use the
information from the dependency parse tree to automatically group together words that are all
talking about the same thing.

Step 7: Named Entity Recognition

The goal of Named Entity Recognition, or NER, is to detect and label these nouns with the real-
world concepts that they represent. Here’s what our sentence looks like after running each token
through our NER tagging model

Step 8: Coreference Resolution

At this point, we already have a useful representation of our sentence. We know the parts of
speech for each word, how the words relate to each other and which words are talking about
named entities.

However, we still have one big problem. English is full of pronouns words like he, she, and it.
These are shortcuts that we use instead of writing out names over and over in each sentence.
Humans can keep track of what these words represent based on context. But our NLP model
doesn’t know what pronouns mean because it only examines one sentence at a time.
Let’s look at the third sentence in our document:

“It was founded by the Romans, who named it Londinium.”

If we parse this with our NLP pipeline, we’ll know that “it” was founded by Romans. But it’s a
lot more useful to know that “London” was founded by Romans.

As a human reading this sentence, you can easily figure out that “it” means “London”. The goal
of coreference resolution is to figure out this same mapping by tracking pronouns across
sentences. We want to figure out all the words that are referring to the same entity.

b) Using a diagram discuss the goal based agent environment


These kind of agents take decision based on how far they are currently from their goal
(description of desirable situations). Their every action is intended to reduce its distance
from the goal. This allows the agent a way to choose among multiple possibilities,
selecting the one which reaches a goal state. The knowledge that supports its decisions is
represented explicitly and can be modified, which makes these agents more flexible.
They usually require search and planning. The goal-based agent’s behavior can easily be
changed.

c) 4 Applications of natural language processing

1. As an enhancement for grammar checking software or writing platform.


2. A better human-computer interface that could convert from a natural language into a
computer language and vice versa. A natural language system could be the interface to a
database system, such as for a travel agent to use in making reservations. A visually
impaired person could use a natural language system to interact with computers.
3. As an add-on for language translation program that could translate from one human
language to another (say for instance, English to Swahili). Natural language processing
will allow for the rudimentary translation, before the involvement of a human translator.
This would cut down the time needed for translating documents.
4. A computer that could understand and process human language, enabling it to convert
mass information either from EBooks or websites, into structured data, before stocking
them into a huge database.

2. A) Using the Vacuum world problem define the problem using five components
States – 8 states
Initial state – any state
Actions – left, right, and suck
Transitional model – complete state space
Goal test – whether both squares are clean
Path cost – each step is 1
b) Discuss the requirements of a knowledge representation

Representational Adequacy – the ability to represent all the different kinds of knowledge
that might be needed in that domain.

Inferential Adequacy – the ability to manipulate the representational structures to derive


new structures from existing structures.

Inferential Efficiency – the ability to incorporate additional information into the knowledge
structure which can be used to focus the attention of the inference mechanisms in the most
promising directions.

Acquisitional Efficiency – the ability to acquire new information easily. Ideally the agent
should be able to control its own knowledge acquisition, but direct insertion of information
by a ‘knowledge engineer’ would be acceptable. Finding a system that optimizes these for all
possible domains is not going to be feasible.

c) Discuss the advantages of using a database for knowledge representational

Naturalness of expression - Real experts rely on rules of thumb rather than on textbook
knowledge. Therefore, expert knowledge can be expresses as a set of production rules. Besides,
individual rules divide the knowledge into natural and readily intelligible chunks (right grain
size).

Restricted syntax - Although this has some disadvantages concerning expressibility, there are
positive consequences too:

1. One can write a program that can read and change a set of production rules
2. One can write a program that helps in the acquisition of knowledge from a domain expert

3. It becomes easy to write a program that translates a production rule in natural language

The problem-solving process - Production rules determine what to do next on base of the
present state of working memory. This has advantages for the overall problem-solving process:

1. Minimal changes to working memory can lead to important focus-shifts in the process 2. PR
systems can pursue a large number of different reasoning at any one time.

So in fact, PR systems give you the best of two worlds.

Modularity - Several architectural properties of production rule systems lead to a very modular
organization of the knowledge base: Separation of permanent and temporary knowledge, Rules
are structurally independent, the interpreter is independent from the knowledge

This modularity has advantages:

1. Facilitates construction and maintaining of rule bases

2. Makes incremental construction possible

3. Easy to debug: possibility to track down which rules lead to wrong answers (because of their
contents of because of the timing of their triggering).

You might also like