You are on page 1of 6

Name: MUSTAPHA FATIMAH OMOLABAKE

Matric No.: 1905022005


Course Title: Artificial Intelligence
Course Code: COM (423)

ASSIGNMENT I

1. Procedures on how to use the following Matlab tool box


i. ANN (Artificial Nural Network)
a. Access and prepare your data
b. Create the artificial neural network
c. Configure the network’s inputs and outputs
d. Tune the network parameters (the weights and biases) to optimize performance
e. Train the network
f. Validate the network’s results
g. Integrate the network into a production system

ii. Fuzzy logic


a. Design Mamdani and Sugeno fuzzy inference systems.
b. Add or remove input and output variables.
c. Specify input and output membership functions.
d. Define fuzzy if-then rules.
e. Select fuzzy inference functions for:
 And operations
 Or operations
 Implication
 Aggregation
 Defuzzification
f. Adjust input values and view associated fuzzy inference diagrams.
g. View output surface maps for fuzzy inference systems.
h. Export fuzzy inference systems to the MATLAB® workspace.

iii. Genetic algorithm


The following outline summarizes how the genetic algorithm works:
1. The algorithm begins by creating a random initial population.
2. The algorithm then creates a sequence of new populations. At each step, the algorithm
uses the individuals in the current generation to create the next population. To create the new
population, the algorithm performs the following steps:
a. Scores each member of the current population by computing its fitness value.
These values are called the raw fitness scores.
b. Scales the raw fitness scores to convert them into a more usable range of values.
These scaled values are called expectation values.
c. Selects members, called parents, based on their expectation.
d. Some of the individuals in the current population that have lower fitness are chosen
as elite. These elite individuals are passed to the next population.
e. Produces children from the parents. Children are produced either by making
random changes to a single parent—mutation—or by combining the vector entries of a pair of
parents—crossover.
f. Replaces the current population with the children to form the next generation.
3. The algorithm stops when one of the stopping criteria is met.
4. The algorithm takes modified steps for linear and integer constraints.
5. The algorithm is further modified for nonlinear constraints.
iv. Image processing
The Image Processing Toolbox provides a set of functions and applications for image processing,
analysis, and visualization. There are many functions available for image analysis, image
segmentation, image enhancement, noise reduction, geometric transformations, and image
registration. Moreover, many functions support multicore and multiprocessor CPUs, and GPUs.
The Image Processing Toolbox provides many GPU-enabled functions. To perform an image
processing operation on a GPU, you can apply the following steps:
a. Transfer data, for example, an image, from the CPU to the GPU by creating gpuArray objects.
b. Perform the image processing operation on the GPU either by using the GPU-enabled functions
that accept a gpuArray object or by performing element-wise or pixel-based operations on a GPU
using arrayfun and bsxfun.
c. Transfer data back to the CPU from the GPU using the gather function.

2. A review on how to use WEKA for machine learning

WEKA (Waikato Condition for Information Investigation) is a broad suite of Java class libraries
that acknowledge many top level PC based knowledge and information mining checks. WEKA
gives executions of reenacted knowledge estimations that you can without a great deal of a stretch
apply to a dataset. It moreover combines a game plan of devices for developing datasets, for
example, the means discretization and taking a gander at. This paper surveys the preprocess a
dataset, feed it into a learning plan, and separate the subsequent classifier and its presentation—all
without framing any program code whatsoever utilizing Weka.
1. Download Weka and Install
Visit the Weka Download page and locate a version of Weka suitable for your computer
(Windows, Mac, or Linux).

2. Start Weka
Start Weka. This may involve finding it in program launcher or double clicking on the weka.jar
file. This will start the Weka GUI Chooser. The Weka GUI Chooser lets you choose one of the
Explorer, Experimenter, KnowledgeExplorer and the Simple CLI (command line interface).

Click the “Explorer” button to launch the Weka Explorer.


This GUI lets you load datasets and run classification algorithms. It also provides other features,
like data filtering, clustering, association rule extraction, and visualization, but we won’t be using
these features right now.

3. Open the data/iris.arff Dataset


Click the “Open file…” button to open a data set and double click on the “data” directory.
Weka provides a number of small common machine learning datasets that you can use to practice
on.
Select the “iris.arff” file to load the Iris dataset.

4. Select and Run an Algorithm


Now that you have loaded a dataset, it’s time to choose a machine learning algorithm to model the
problem and make predictions. Click the “Classify” tab. This is the area for running algorithms
against a loaded dataset in Weka. You will note that the “ZeroR” algorithm is selected by default.
Click the “Start” button to run this algorithm.

6. Review Results
7.

3. A comprehensive overview of orange data mining tool for:


a. Sentiment analysis
Sentiment analysis (SA) is based on natural language processing (NLP) techniques used to extract
the user’s feelings and opinions about any manufactured goods or services provided. Opinion
mining is the other name for sentiment analysis. Sentiment analysis is very useful in the decision-
making process. With greater Internet use, SA is a powerful tool for studying the opinions of
customers about any product or services provided by any business organization or a company.
Several approaches and techniques have came to existence in past years for sentiment analysis.
Sentiment analysis is useful in decision making. In this paper, we offer an exhaustive description
about techniques used for SA, approaches used for SA and applications of sentiment analysis.

Sentiment analysis process on product reviews.

Sentiment Analysis can be considered a classification process as illustrated in the diagram above.
There are three main classification levels in SA: document-level, sentence-level, and aspect-level
SA.
(i) Document-level SA aims to classify an opinion document as expressing a positive or
negative opinion or sentiment. It considers the whole document a basic information unit
(talking about one topic).
(ii) Sentence-level SA aims to classify sentiment expressed in each sentence. The first step is to
identify whether the sentence is subjective or objective. If the sentence is subjective,
Sentence-level SA will determine whether the sentence expresses positive or negative
opinions. Wilson et al. have pointed out that sentiment expressions are not necessarily
subjective in nature. However, there is no fundamental difference between document and
sentence level classifications because sentences are just short documents. Classifying text at
the document level or at the sentence level does not provide the necessary detail needed
opinions on all aspects of the entity which is needed in many applications, to obtain these
details; we need to go to the aspect level.
(iii) Aspect-level SA aims to classify the sentiment with respect to the specific aspects of
entities. The first step is to identify the entities and their aspects. The opinion holders can
give different opinions for different aspects of the same entity like this sentence “The voice
quality of this phone is not good, but the battery life is long”. This survey tackles the first
two kinds of SA.
Sentiment classification techniques

b. Topic modeling
Topic Modelling discovers abstract topics in a corpus based on clusters of words found in each
document and their respective frequency. A document typically contains multiple topics in
different proportions, thus the widget also reports on the topic weight per document.
The widget wraps gensim’s topic models (LSI, LDA, HDP).
The first, LSI, can return both positive and negative words (words that are in a topic and those that
aren’t) and concurrently topic weights, that can be positive or negative. As stated by the main
gensim’s developer, Radim Řehůřek: “LSI topics are not supposed to make sense; since LSI
allows negative numbers, it boils down to delicate cancellations between topics and there’s no
straightforward way to interpret a topic."
LDA can be more easily interpreted, but is slower than LSI. HDP has many parameters - the
parameter that corresponds to the number of topics is Top level truncation level (T). The smallest
number of topics that one can retrieve is 10.

1. Topic modelling algorithm:

o Latent Semantic Indexing. Returns both negative and positive words and topic
weights.
o Latent Dirichlet Allocation
o Hierarchical Dirichlet Process
 Parameters for the algorithm. LSI and LDA accept only the number of topics
modelled, with the default set to 10. HDP, however, has more parameters. As this algorithm is
computationally very demanding, we recommend you to try it on a subset or set all the required
parameters in advance and only then run the algorithm (connect the input to the widget).
o First level concentration (γ): distribution at the first (corpus) level of Dirichlet
Process
o Second level concentration (α): distribution at the second (document) level of
Dirichlet Process
o The topic Dirichlet (α): concentration parameter used for the topic draws
o Top level truncation (Τ): corpus-level truncation (no of topics)
o Second level truncation (Κ): document-level truncation (no of topics)
o Learning rate (κ): step size
o Slow down parameter (τ)
 Produce a report.
 If Commit Automatically is on, changes are communicated automatically.
Alternatively press Commit.

c. Word embedding
Word embedding — the mapping of words into numerical vector spaces — has proved to be an
incredibly important method for natural language processing (NLP) tasks in recent years, enabling
various machine learning models that rely on vector representation as input to enjoy richer
representations of text input. These representations preserve more semantic and syntactic
information on words, leading to improved performance in almost every imaginable NLP task.
Document Embedding parses ngrams of each document in corpus, obtains embedding for each
ngram using pretrained model for chosen language and obtains one vector for each document by
aggregating ngram embeddings using one of offered aggregators. Note that method will work on
any ngrams but it will give best results if corpus is preprocessed such that ngrams are words
(because model was trained to embed words).

1. Widget parameters:

o Language: widget will use a model trained on documents in chosen language.


o Aggregator: operation to perform on ngram embeddings to aggregate them into a
single document vector.
 Cancel current execution.
 If Apply automatically is checked, changes in parameters are sent automatically.
Alternatively press Apply.

Document embedding approaches

A possible way to map the field is into the following four prominent approaches:
1. Summarizing word vectors: This is the classic approach. Bag-of-words does exactly this
for one-hot word vectors, and the various weighing schemes you can apply to it are variations on
this way to summarizing word vectors. However, this approach is also valid when used with the
most state-of-the-art word representations (usually by averaging instead of summing), especially
when word embeddings are optimized with this use in mind, and can stand its ground against any
of the sexier methods covered here.

2. Topic modelling: While this is not usually the main application for topic modeling
techniques like LDA and PLSI, they inherently generate a document embedding space meant to
model and explain word distribution in the corpus and where dimensions can be seen as latent
semantic structures hidden in the data, and are thus useful in our context. I don’t really cover this
approach in this post (except a brief intro to LDA), since I think that it is both well represented by
LDA and well known generally.

3. Encoder-decoder models: This is the newest unsupervised addition to the scene, featuring
the likes of doc2vec and skip-thought. While this approach has been around since the early 2000’s
— under the name of neural probabilistic language models — it has gained new life recently with
its successful application to word embedding generation, with current research focusing on how to
extend its use to document embedding. This approach gains more than others from the increasing
availability of large unlabeled corpora.

4. Supervised representation learning: This approach owes its life to the great rise (or
resurgence) of neural network models, and their ability to learn rich representations of input data
using various non-linear multi-layer operators, which can approximate a wide range of mappings.

You might also like